library(tidyverse)
library(sf)
library(tmap)Session 2: Getting transport datasets with R
1 Introduction
In this session, we will learn how to get transport datasets using R. The contents of the session are as follows:
- We’ll start with a short lecture on data sources and ways of classifying transport datasets (see the slides)
- Reviewing the homework from the previous session
- Session activities: importing and exploring a range of transport datasets in your own time
- Bonus: exploring the Cadence platform
- Homework for the next session
1.1 Review Homework
You should now be familiar with the basics of R, Quarto and the structure of transport datasets, having completed the homework from the previous session.
We will do a demo of trying to reproduce the demo from last week and discuss any issues you had running the code in Chapter 13 of Geocomputation with R.
1.2 Prerequisites
Note: you may need to install the pct package as follows:
remotes::install_github("ITSLeeds/pct")
We will also load the following packages:
Note that this session uses imports and uses geographic data with the sf package. Read more about the package in chapters 2 onwards in Geocomputation with R and in the sf package documentation.
2 Getting OpenStreetMap data
Work through the reproducible code in the “Introducing osmextract” vignette hosted at https://docs.ropensci.org/osmextract/articles/osmextract.html.
2.1 Bonus exercises
- Reproduce the examples
- Get all supermarkets in OSM for West Yorkshire
- Identify all cycleways in West Yorkshire and, using the stats19 data you have already downloaded, identify all crashes that happened near them.
Import and visualise a dataset with supermarket names and locations with the following code (see the source code of the session to see how the supermarket data was obtained with osmextract):
supermarkets = sf::read_sf("https://github.com/ITSLeeds/tds/releases/download/2025/supermarkets_points_cleaned.geojson")
library(tmap)
tmap_mode("view")
tm_shape(supermarkets) +
tm_dots("name_simplified")2.2 (Optional) Overture Maps
Overture Maps is a new geospatial dataset designed to complement OpenStreetMap. It is produced by the Overture Maps Foundation, a collaboration between several large mapping and technology organizations (Amazon, Meta, Microsoft and TomTom).
While OSM is community-edited and extremely rich, Overture Maps takes a different approach: it publishes curated, standardised layers designed to be easy to use at scale for data science and analytics. While OSM acts as the foundational data source, its tagging system is very flexible and often inconsistent; Overture cleans the OSM data and converts it to a fixed schema, as well as combining it with additional open and proprietary-contributed datasets from its member organisations.
Read the docs to explore how to download and use Overture Maps datasets in GeoParquet format. Data in Overture Maps is characterised by theme and type: try exploring these using the layer options in the Explorer as shown in the image below.

3 Getting road traffic casualty data
Work through the reproducible code in the “Getting started with stats19” vignette hosted at docs.ropensci.org/stats19.
4 Boundary datasets
Boundary datasets are useful for mapping and spatial analysis, providing the geographical context for other datasets. You can download geographic datasets directly from the ONS Geoportal.
You can also search for boundary datasets using the esri2sf package, which provides a function esrisearch to search for datasets on the ESRI ArcGIS platform. To illustrate this programatic way of getting boundary data, we will search for the “Local Authority Districts December 2024 Boundaries UK” dataset and download it using the arcgis package.
pak::pkg_install("elipousson/esri2sf")
remotes::install_github("r-arcgis/arcgis", dependencies = TRUE)
res = esri2sf::esrisearch("Local Authority Districts (May 2023) Boundaries UK")
res = res |>
dplyr::filter(type == "Feature Service") |>
# # 2023 versions:
# dplyr::filter(str_detect(title, "2023")) |>
# BUC:
dplyr::filter(str_detect(title, "BUC"))
res$title
u_from_res = paste0(res$url[1], "/0")
library(arcgis)
res_sf = arc_read(u_from_res)
plot(res_sf$geometry)5 Census data
5.1 The ONS “create a custom dataset” tool
The Office for National Statistics (ONS) provides a tool to create custom datasets. The tool is flexible and provides datasets in a variety of formats, including CSV. Give the tool a try at www.ons.gov.uk/datasets/create. To test the tool, try to get data on travel to work patterns for all usual residents in England and Wales at the local authority level (note: you may need to change the file name to match the one you downloaded).
res_sf = sf::st_read("lad_boundaries_2023.geojson")Reading layer `lad_boundaries_2023' from data source
`/home/runner/work/tds/tds/s2/lad_boundaries_2023.geojson'
using driver `GeoJSON'
Simple feature collection with 361 features and 11 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -116.1928 ymin: 7054.1 xmax: 655653.9 ymax: 1220310
Projected CRS: OSGB36 / British National Grid
travel_to_work_lad = readr::read_csv("custom-filtered-2025-02-04T00_06_30Z.csv")
# names(travel_to_work_lad)
# [1] "Lower tier local authorities Code"
# [2] "Lower tier local authorities"
# [3] "Distance travelled to work (8 categories) Code"
# [4] "Distance travelled to work (8 categories)"
# [5] "Method used to travel to workplace (12 categories) Code"
# [6] "Method used to travel to workplace (12 categories)"
# [7] "Observation"
travel_to_work_updated = travel_to_work_lad |>
select(
LAD23CD = `2023 Lower tier local authorities Code`,
Mode = `Method used to travel to workplace (12 categories)`,
Distance = `Distance travelled to work (8 categories)`,
Observation = Observation
)
# Pivot wider:
ttw_wide = travel_to_work_updated |>
pivot_wider(names_from = c(Distance, Mode), values_from = Observation)
summary(res_sf[["LAD23CD"]] %in% travel_to_work_lad[[1]]) Mode FALSE TRUE
logical 44 317
# Other way around:
summary(travel_to_work_lad[[1]] %in% res_sf[["LAD23CD"]]) Mode TRUE
logical 30432
# names(ttw_wide)6 The cadence platform
Sign up to Cadence website at cadence360.cityscience.com/ by clicking ‘Sign In’ in the top right. New users can then either create an account or sign in to an existing account.
7 Joining datasets
Two key ways to join datasets are by spatial location and by a common key. We will demonstrate the latter using the dplyr package.
lad_joined = left_join(
res_sf,
ttw_wide
)Let’s visualise the results with a choropleth map made using ggplot2.
ggplot(lad_joined) +
geom_sf(aes(fill = `Less than 5km_Driving a car or van`), colour = NA) +
scale_fill_viridis_c() +
theme_minimal()
See r.geocompx.org/spatial-operations for spatial joins.
8 Homework
Read Chapter 28: Quarto in R for Data Science and answer at least one of the questions in the Exercises section of that chapter.
Take a read of the recent Nature article on general intelligence (Chen et al., 2026) and reflect on its implications for data science and transport planning, in preparation for the next session.
8.1 Bonus
The following are optional bonus tasks that are not essential but will help deepen your understanding.
Register for GitHub Copilot or Google Gemini Pro for students and test out at least one tool for agentic coding such as GitHub Copilot in VSCode or Gemini CLI.
Take a quick read of, and try to reproduce some of the code another chapter of R4DS such as:
- Chapter 1: Data visualisation
- Chapter 3: Data transformation
- One of you choice, e.g.