library(tidyverse)
library(sf)
library(tmap)
Getting transport datasets with R
1 Introduction
In this practical session, we will learn how to get transport datasets using R. The contents of the session are as follows:
- We’ll start with a short lecture on data sources and ways of classifying transport datasets (see the slides)
- Reviewing the homework from the previous session
- Practical session importing and exploring a range of transport datasets in your own time
- Bonus: exploring the Cadence platform
- Homework for the next session
1.1 Review Homework
You should now be familiar with the basics of R, Quarto and the structure of transport datasets, having completed the homework from the previous session.
We will do a demo of trying to reproduce the demo from last week and discuss any issues you had running the code in Chapter 13 of Geocomputation with R.
1.2 Prerequisites
Note: you may need to install the pct
package as follows:
remotes::install_github("ITSLeeds/pct")
We will also load the following packages:
Note that this practical uses imports and uses geographic data with the sf
package. Read more about the package in chapters 2 onwards in Geocomputation with R and in the sf package documentation.
2 Getting OpenStreetMap data
Work through the reproducible code in the “Introducing osmextract” vignette hosted at https://docs.ropensci.org/osmextract/articles/osmextract.html.
2.1 Bonus exercises
- Reproduce the examples
- Get all supermarkets in OSM for West Yorkshire
- Identify all cycleways in West Yorkshire and, using the stats19 data you have already downloaded, identify all crashes that happened near them.
Import and visualise a dataset with supermarket names and locations with the following code (see the source code of the practical to see how the supermarket data was obtained with osmextract
):
= sf::read_sf("https://github.com/ITSLeeds/tds/releases/download/2025/supermarkets_points_cleaned.geojson")
supermarkets library(tmap)
tmap_mode("view")
tm_shape(supermarkets) +
tm_dots("name_simplified")
3 Getting road traffic casualty data
Work through the reproducible code in the “Getting started with stats19” vignette hosted at docs.ropensci.org/stats19.
4 Boundary datasets
Boundary datasets are useful for mapping and spatial analysis, providing the geographical context for other datasets. You can download geographic datasets directly from the ONS Geoportal.
You can also search for boundary datasets using the esri2sf
package, which provides a function esrisearch
to search for datasets on the ESRI ArcGIS platform. To illustrate this programatic way of getting boundary data, we will search for the “Local Authority Districts December 2024 Boundaries UK” dataset and download it using the arcgis
package.
::pkg_install("elipousson/esri2sf")
pak::install_github("r-arcgis/arcgis", dependencies = TRUE)
remotes= esri2sf::esrisearch("Local Authority Districts (May 2023) Boundaries UK")
res = res |>
res ::filter(type == "Feature Service") |>
dplyr# # 2023 versions:
# dplyr::filter(str_detect(title, "2023")) |>
# BUC:
::filter(str_detect(title, "BUC"))
dplyr$title
res= paste0(res$url[1], "/0")
u_from_res library(arcgis)
= arc_read(u_from_res)
res_sf plot(res_sf$geometry)
5 Census data
5.1 The ONS “create a custom dataset” tool
The Office for National Statistics (ONS) provides a tool to create custom datasets. The tool is flexible and provides datasets in a variety of formats, including CSV. Give the tool a try at www.ons.gov.uk/datasets/create. To test the tool, try to get data on travel to work patterns for all usual residents in England and Wales at the local authority level (note: you may need to change the file name to match the one you downloaded).
Reading layer `lad_boundaries_2023' from data source
`/home/runner/work/tds/tds/p2/lad_boundaries_2023.geojson'
using driver `GeoJSON'
Simple feature collection with 361 features and 11 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -116.1928 ymin: 7054.1 xmax: 655653.9 ymax: 1220310
Projected CRS: OSGB36 / British National Grid
= readr::read_csv("custom-filtered-2025-02-04T00_06_30Z.csv")
travel_to_work_lad # names(travel_to_work_lad)
# [1] "Lower tier local authorities Code"
# [2] "Lower tier local authorities"
# [3] "Distance travelled to work (8 categories) Code"
# [4] "Distance travelled to work (8 categories)"
# [5] "Method used to travel to workplace (12 categories) Code"
# [6] "Method used to travel to workplace (12 categories)"
# [7] "Observation"
= travel_to_work_lad |>
travel_to_work_updated select(
LAD23CD = `2023 Lower tier local authorities Code`,
Mode = `Method used to travel to workplace (12 categories)`,
Distance = `Distance travelled to work (8 categories)`,
Observation = Observation
)# Pivot wider:
= travel_to_work_updated |>
ttw_wide pivot_wider(names_from = c(Distance, Mode), values_from = Observation)
summary(res_sf[["LAD23CD"]] %in% travel_to_work_lad[[1]])
Mode FALSE TRUE
logical 44 317
# Other way around:
summary(travel_to_work_lad[[1]] %in% res_sf[["LAD23CD"]])
Mode TRUE
logical 30432
# names(ttw_wide)
6 The cadence platform
Sign up to Cadence website at cadence360.cityscience.com/ by clicking ‘Sign In’ in the top right. New users can then either create an account or sign in to an existing account.
7 Joining datasets
Two key ways to join datasets are by spatial location and by a common key. We will demonstrate the latter using the dplyr
package.
= left_join(
lad_joined
res_sf,
ttw_wide )
Let’s visualise the results with a choropleth map made using ggplot2
.
ggplot(lad_joined) +
geom_sf(aes(fill = `Less than 5km_Driving a car or van`), colour = NA) +
scale_fill_viridis_c() +
theme_minimal()
See r.geocompx.org/spatial-operations for spatial joins.
8 Homework
- In preparation for the next practical session, take a read of and try to reproduce the code in the vignette “An introduction to origin-destination data” for the
od
package.- Import some OD data using the
pct
package, as documented at itsleeds.github.io/pct.
- Import some OD data using the
- Download and visualise 3 transport-related datasets of your choice and save the results in a reproducible .qmd file.
- Bonus: generate a .pdf document showing the results.
- Take a quick read of, and try to reproduce some of the code in, at least three of the chapters in R4DS:
- Try to reproduce and modify the code I wrote during the live demo to get OSM data for a city of your choice.
- See the source code here: https://github.com/itsleeds/tds/blob/main/p2/demo.qmd
- See the results here: https://itsleeds.github.io/tds/p2/demo.html
- Check-out the code I used to generate the interactive map of source code in this Discussion on GitHub: https://github.com/itsleeds/tds/discussions/166
- Bonus: comment on the discussion with your thoughts on the code and how it could be improved.
- Bonus 2: try opening a new Discussion comment in the tds repo at github.com/itsleeds/tds/discussions and share your thoughts on the practical session.