Reproducible data science for road safety research
Introduction
This workshop will take place at the University of Leeds’ Institute for Transport Studies (ITS) as part of the RS5C conference that runs from 3rd to 5th September 2025. The workshop takes place on the 2nd September 2025, 14:00-16:00, the day before the main conference starts.
The workshop will cover the fundamentals of reproducible data science for road safety research, building on a decade’s worth of experience working with road traffic casualty datasets for policy-relevant road safety research. The UK’s open access STATS19 database will be the basis of the session but the skills learned will be applicable to any road safety datasets. The session will cover:
- Importing collision, casualty and vehicle tables
- Temporal visualisation and aggregation
- Spatial visualisation and aggregation
- Joining STATS19 tables
- Spatial joins linking infrastructure to collisions
The course will be taught in R, a free and open-source programming language for data analysis and visualisation that excels at the kind of statistical modelling and visualisation workflows required for high-impact, reproducible and correct road safety research. The course will be taught by Professor Robin Lovelace, who has over a decade of experience teaching R for data science and is author of the popular book Geocomputation with R. You will learn how to add value to road traffic casualty date for more data-driven and effective interventions to save lives in relation to the largest cause of death for young people worldwide, as highlighted in the map below.
Road danger levels worldwide in 2016. Data source: World Bank. Reproducible source code: Reproducible Road Safety Research with R, freely available at itsleeds.github.io/rrsrr/.
Who is this for?
The workshop is aimed at anyone interested in road safety research, especially for students, researchers, and practitioners who are already working with road safety data and who would like to improve their data science skills for more reproducible and impactful research.
You are welcome to sign-up and learn from the session if you are new to R or if you want to work through the practical content in another language such as Python or Julia. However, the session will be most useful if you have some prior experience with R and RStudio (see links below for recommended reading and places to learn R). We highly recommend that attendees already use R or dedicate some time to learning the basics of R before the session.
Prerequisites
Attendees should have the following before signing-up:
- Basic familiarity with R and RStudio or expertise in another programming language for data science
- A laptop with R and RStudio installed (or VS Code with the R extension or similar for advanced users)
- A willingness to learn (see recommended reading) and share knowledge
Recommended reading
The session will build on the following resources, we recommend taking a read of one or more of these before signing-up:
- The first practical session of the Transport Data Science module, freely available at itsleeds.github.io/tds/p1/, for an introduction to data science for transport research
- Reproducible Road Safety Research with R (Lovelace, 2020)
- Introductory guide for analyzing road safety data in R
- Geocomputation with R and Chapter 13 in particular (Lovelace et al., 2025)
- A paper exploring the spatial distribution of cycling casualties in West Yorkshire, using exploratory data analysis (EDA) techniques (Lovelace et al., 2016)
- A paper exploring social inequalities in cycling casualties nationwide (Vidal Tortosa et al., 2021)
- Papers investigating the relationships between new contraflow interventions and traffic levels and collision rates in London (Tait et al., 2024, 2023)
Where and when
2nd September 2025, 14:00-16:00, Room 1.11, ITS, University of Leeds
Sign-up!
Sign-up (£50) at the University of Leeds conference website at eu.eventscloud.com/ereg/newreg.php?eventid=200280778