9 Next steps

You have reached the end of this short guide on reproducible road safety research with R. Armed with the knowledge of what R and RStudio can do, and how add-on packages provide powerful tools for a wide range of data analysis, visualisation and statistical modelling tasks, you should have a much better understanding of the language’s capabilities.

I hope that in the process of working through the exercises, you have learned not only the technicalities of data science with a powerful tool of the trade, but also a way of working that puts reproducibility centre stage. Learning any new skill takes time and effort. However, in my experience, once you get past a critical threshold, the amount of time saved using the new approach starts to outweigh the amount of time involved in becoming fluent. The same concept applies to other ‘tools of the trade’ that are available, such as the open source geographic information system (GIS) software, QGIS and other languages for data science, such as Python and Julia.

Rather than go off and learn such additional tools, we encourage you to stick with R. It is preferable to know one language in-depth and then branch out to learn other approaches than to learn many approaches superficially or, as Grolemund and Wickham (2016) put it in the excellent R for Data Science (R4DS) book: “You will get better faster if you dive deep, rather than spreading yourself thinly over many topics”. In terms of next steps, you cannot go wrong with checking-out the R4DS website which, like this book, has worked examples and exercises in abundance on a much wider range of data science topics. As you will see by visiting r4ds.had.co.nz many of these topics, including workflow and modelling, will be of use from a road safety research perspective.

A strength of R is its flexibility. It can be used as a calculator one minute and a statistical modelling toolbox the next. R can be used as a web application development framework the next, as illustrated by major shiny apps such as the Propensity to Cycle Tool (try it at www.pct.bike) and tools developed by road safety consultancy Agilysis, described in Section 9.4 below. Indeed, within the R ecosystem there are many sub-ecosystems, each of which has excellence free and open resources for people who want to learn more in a particular domain. If you are particularly in the geographic analysis of road crash data, the book Geocomputation with R by yours truly and which has already been mentioned in Chapter 7, is highly recommended (Lovelace, Nowosad, and Muenchow 2019). If you are looking for methods of analysing trends and forecasting with time series data, Hyndman and Athanasopoulos (2018) is highly recommended. Indeed, there is a whole library’s worth of open resources to be found on any area of data-driven research online, from web development and visualisation to text analysis. A recommended next step for learning more in regards to any of these areas is the website bookdown.org, which links to books that can also be bought as physical items if you, like many people, prefer learning with a paper resource.

In fact, with the size and rapidly evolving nature of the R ecosystem, one of the hardest things for a beginner is knowing which packages, functions or workflow options to choose from out of a wide array of options. The internet is there to help you, but it can also hinder your progress by serving-up out-of-date solutions and providing ‘quick fixes’ at the expense of a deep understanding. Therefore, instead of trying to be comprehensive (focussed web searches prioritising tried-and-tested solutions documented in authoritative sources can help with that), the rest of this final section provides pointers on a few particularly useful aspects of R from a road safety analysis perspective. Most people who learn R (or any computer language) will at become frustrated due to tricky-to-fix error messages. As outlined in Section 9.4, written by people who have navigated R’s at first daunting learning curve, it can take only a few weeks of learning to get to the point where saves more time than it consumes and takes your work “to the next level”.

9.1 Automated reporting with RMarkdown

An advantage of R is that it has many packages dedicated to the communication and publication of results, vital for policy impact. Perhaps the most important important package with regards to the communication of results is rmarkdown. This is more of a framework than a package, providing a powerful system for generating reports, web pages and even books (this book was written with the bookdown package, which builds onrmarkdown).

Here is not the place to explain how to use RMarkdown and their associated .Rmd files. The framework is explained in detail in a free and open book (Xie, Allaire, and Grolemund 2018). To get started with the framework, however, you can try the following code example, which shows the creation of an Rmd file:

file.edit("test-document.Rmd")

Try adding some code chunks and text by following the guidance in the Rmd cheatsheet, which you can get from the Help > Cheatsheets menu in RStudio.

9.2 Sharing code

Another way to increase the impact of your code is to share it. This can help collaborate with colleagues, getting feedback from others, and generating interest in your work as part of collaborative research processes that have been in operation for hundreds of years, as summarised by the phrase, ‘Building on the shoulders of giants.’ A more prosaic, but perhaps more important, corollary to that is, ‘Do not reinvent the wheel.’ By getting your code ‘out there’ you will be able to ensure that others can use your code and, because publishing your code encourages searching for other code bases, help you to find components written by others to improve your work. Sharing code can therefore save many hours of time, provided you are happy to read and re-use, and of course give due credit and reference to code and ideas from other people.

The easiest way to share code in 2020 (and likely for the foreseeable future) is GitHub, an online code sharing, project management and file hosting platform. A great way to get started with GitHub, after you have signed up and created a user name at github.com, is to contribute to an existing project. Challenge: suggest a change to the code repository on GitHub that contains the source code of this book.

9.3 Asking questions

A final thing to say on R before the testimonials below is how to ask questions. There are many places to ask for help online, including:

  • The question and answer site https://stackoverflow.com/. You will get quick answers here but be warned, answers may not always be particularly friendly if you ask a question that doesn’t make sense or which has already been answered in the documentation - that should not put you off though, sometimes it’s a case of, ‘Don’t ask, don’t get.’

  • The https://community.rstudio.com/ forum, where you may get more detailed and friendlier responses, especially if the question relates to RStudio, although the answer may be slower.

  • Special interest groups such as https://gis.stackexchange.com/ (for GIS related questions), and the Slack group RSGB Analyst Network for road safety data analysis questions

Perhaps better than all of the above, is to ask a colleague who has slightly more experience than you. That way you will build ‘collaboration networks.’ The final thing to say on asking questions is ‘use reprex!’ To see what I mean by this try typing the following:

# example of creating a good reprex:
reprex::reprex({
  x = 1
  y = "2"
  # why does this fail?
  x + y
  # but this succeeds?
  x + as.numeric(y)
})
# after running the code above you can share the copied output to help ask questions

Note, you can turn any bit of code into a ‘reprex’ by selecting it and by running the ‘Reprex selection’ addin in RStudio, as described on the Tidyverse website.

9.4 Testimonials

This final section provides insight not only into how R can be used for road safety research from a range of perspectives, but also navigating R’s at times steep learning curve.

9.4.1 R for professional road safety analysts

Will Cubbin, Road Safety Strategy Analyst, Safer Essex Roads Partnership

When I attended the two-day course ‘introduction to R’ I had little confidence in my natural ability to learn coding. Although familiar with many functions in Excel and having dabbled in VBA, my two previous attempts at any kind of computer language both ended in literal failure. At university I failed a module on C++ and in a previous job I failed a training course on SQL!

As expected the course was a steep learning curve but after two days I had definitely learned a few tricks. However I was still concerned about the amount of material covered by the course that I hadn’t understood. It turned out this was actually a good thing because the breadth of the course showed me what R was capable of, and how it could be useful. The next part of my journey with R was to use the course materials and build on the basics I had learned, to achieve what the course showed was possible.

I began with the aim of using the geospatial analysis capability in R to visualise collision data in new ways, to give more detailed insight and present it in a way that would inform meaningful action for front line resources. Having a clear goal of “This is what I want to achieve with my first R project” was crucial. The course had given me an idea of the sort of processes I needed to undertake in order to achieve this goal. The post-course support through GitHub was very good, I also learned a lot by finding examples of code on places like GitHub and stackoverflow through Google searches. The other crucial element in getting my first success with R was having time dedicated to working on the project immediately after the course. I spent two weeks working almost exclusively on this project, starting the week after the R course.

The result was well worth the effort. After the initial two weeks of intensive learning with R, I spent 4 to 6 weeks working on the project a couple of days per week. By the end of this period I had working versions of two interactive mapping tools, comprising: 1) A multi layered leaflet map showing collision locations, collision density along main roads and a “heatmap” (Kernel Density Estimation raster) layer. I made multiple versions of each layer for different modes of transport and behaviours such as drink driving. 2) A ‘shiny’ mapping app showing collision locations and basic details with date filter. I was able to embed this on our website for public use. It can be viewed here SERP website data page under the heading ‘Interactive Map’.

I soon added a second R script to the first of the two projects described above. This script produced and exported a range of standardised infographics showing various breakdowns of the data contained in the map. This allowed me to almost fully-automate the process for updating a proactive Roads-Policing tasking document. It turned this monthly process, which previously took 1 working day to complete, into one taking just 45 minutes. It also added more useful insight to the monthly tasking product.

My next steps are to continue another project using an API to access vehicle telematics data. This project extracts driving events, such as harsh braking and harsh cornering, and plots them on an interactive map. I will also be using R for some statistical analysis as part of a research project I have recently started. Thinking of myself as an “Excel native”, I would say R hasn’t replaced Excel, but has been a powerful addition to my toolbox so I can do more interesting and in-depth work than ever before.

9.4.2 R in a road safety research consultancy

Dr Craig Smith, Data Scientist, Agilysis Ltd

After attending my first R course, I immediately saw how useful the language could be to our team. At Agilysis, we have integrated R into almost everything our analytics team does, pushing for our work to be as robust and reproducible as possible. The incredible integration R has with SQL database engines, cloud-based infrastructure like AWS, and proprietary GIS software like ArcGIS, has given us a huge scope to automate a lot of our regular data processing tasks. The added ability to automate the production of reports, charts and maps has allowed us to gain quick insights into our data, speeding up our exploratory analysis. The huge Shiny ecosystem has allowed us to produce interactive applications for sharing our tools and visualisations with stakeholders. There is also a wide range of open-source statistical and machine learning packages available which, when combined with the added capability to translate and use tools from python, has allowed us to innovate and take full advantage of what artificial intelligence has to offer. All of this, embedded in a supportive community of R users that is continually sharing its knowledge and helping spread these skills, means that anything is possible for users, whatever their level of experience. The fact that this community includes a growing number of road safety analysts and transport-focused data scientists (thanks, in part, to this book and the previous R for Road Safety courses) means that there are plenty of like-minded R users all over the country that you can share ideas (and code) with.

9.4.3 Using R in a road safety charity

Emily Nagler, Data Analyst, RAC Foundation

I was first introduced to R in my master’s program by a professor who was very enthusiastic as to how much better it was than ArcGIS for spatial analysis. Whilst he did project his views very strongly on us as students, I still tended towards ArcGIS when given a choice between the two. I had never written code before this point, and to me the GUI of ArcGIS was simply easier to work with. However, over the course of my program I incorporated R more and more as our coursework became increasingly complex. Datasets with tens of thousands of rows quickly became a pain in Excel, and the inefficiency of working between two programs rather than one was tedious. Like many of my classmates, I soon realised the point our professor had been trying to make all along: in order to push your analytical capabilities to the next level you need to use the right tool. I can agree that there is a steep learning curve to R, but once you’ve become comfortable with the language, it’s harder to go back than it is to go forward. Meaning, once you reach a point in your proficiency, it’s easier to build on your skills and knowledge of the language than it is to revert back to Excel knowing there is a smarter alternative.

I found myself face to face with R again in my role at the RAC Foundation a year later, and since I’d already become an R convert, it was easy to pick up again. However, now I saw new benefits from a different perspective, relating to collaboration and reproducibility. Before this point, the only eyes on my scripts were my own, and there was no need to refer back to them once an assignment was done. However, in my role as a data analyst, it is important and necessary to create something that can be shared and understood by others. The ability to have a full account written out line by line makes it much easier to work off someone else’s code and, importantly, to quality check one another’s work. By having traceable, repeatable analysis, there is a level of accountability with the method that you simply cannot get in Excel.

Another great aspect of R is the helpful online community presence, thanks to its open source nature. Anyone can access the software, and anyone can contribute to the discussion, which I often find more helpful than the FAQs of a paid for software. I would say that this aspect alone has helped hugely in my work, both academically and professionally. I’m never far from an answer when stuck, no matter how specific the problem gets. In terms of its use in my work on road safety and transport analysis it has become a great tool for combining data with maps, something that I think highlights results in a more insightful way. That being said, visualisation is just as important as the analysis, and I’ve been pleased with the array of packages that can create eye-catching maps, interactive dashboards, and much more outside of the traditional data wrangling capabilities. Looking to the future of my career I see myself continuing to use R and I’m confident that its capabilities will only get better the more people engage with it.

9.4.4 Using R in other areas of road safety research

Do you have a use case of reproducible research? Please get in touch on the issue tracker.