1 Introduction

This guide supports workshops on advanced usage and development of the Propensity to Cycle Tool (PCT). Beginner and intermediate PCT events focus on using the PCT via the web application hosted at www.pct.bike and the data provided by the PCT in QGIS.

The focus here is on analysing cycling potential in the open source statistical programming language R. We use R because the PCT was developed in, and can be extended with, R code. Using open source software with a command-line interface reduces barriers to entry, enabling the development of open access transport models for more citizen-led and participatory transport planning, including integration with the A/B Street city simulation and editing software (Lovelace 2021).

The guide covers:

  • How to download and get data from the PCT (covered in more detail using R and QGIS in the ‘getting’ vignette)
  • How to compare PCT data with cycle infrastructure data to identify gaps in the network
  • How the code underlying the PCT works
  • How to generate new scenarios of cycling uptake

See the marked-up version of this vignette online at the following URL:

https://github.com/ITSLeeds/pct/releases/download/0.8.1/Propensity.to.Cycle.Tool.Advanced.Workshop.html

1.1 Preparation

If you are new to R, you should install R and RStudio before the course. For instructions on that, see the download links at cran.r-project.org and RStudio.com.

R is a powerful statistical programming language for data science and a wide range of other applications and, like any language, takes time to learn. To get started we recommend the following free resources:

If you want to calculate cycle routes from within R, you are recommended to sign-up for a CycleStreets API key. See here to apply and see here for instructions on creating a ‘environment variable’ (recommended for experienced R users only).

It may also be worth taking a read about the PCT if you’re not familiar with it before the course starts.

1.2 Prior reading

In addition to computer hardware (a laptop) and software (an up-to-date R set-up and experience using R) pre-requisites, you should have read, or at least have working knowledge of the contents of, the following publications, all of which are freely available online:

1.3 Prerequisites

To ensure your computer is ready for the course, you should be able to run the following lines of R code on your computer:

install.packages("remotes")
pkgs = c(
  "cyclestreets",
  "mapview",
  "pct",
  "sf",
  "stats19",
  "stplanr",
  "tidyverse",
  "devtools"
)
remotes::install_cran(pkgs)
# remotes::install_github("ITSLeeds/pct")

To test your computer is ready to work with PCT data in R, you can also try running the code hosted at https://raw.githubusercontent.com/ITSLeeds/pct/master/inst/test-setup.R to check everything is working:

source("https://github.com/ITSLeeds/pct/raw/master/inst/test-setup.R") 

If you have any questions before the workshop, feel free to ask a question on the package’s issue tracker (requires a GitHub login): https://github.com/itsleeds/pct/issues

2 How the PCT works and what you can use it for

  • Take a look at the image below from (Woodcock et al. 2021) and shown in Figure ??, which builds on an earlier diagram in Lovelace et al. (2017). Which parts of the process are of most interest for your work?

The PCT provides data at 4 geographic levels:

  • Zones
  • Desire lines
  • Routes
  • Route networks

Which types of data are most appropriate to tackle each of the questions/problems you identified?

  • Name 3 limitations of the data currently provided by the PCT and discuss how you could overcome them

3 Getting and exploring PCT data

In this section you will learn about the open datasets provided by the PCT project and how to use them. While the most common use of the PCT is via the interactive web application hosted at www.pct.bike, there is much value in downloading the data, e.g. to identify existing cycling infrastructure in close proximity to routes with high potential, and to help identify roads in need of interventions from a safety perspective, using data from the constantly evolving and community-driven global geographic database OpenStreetMap (OSM) (Barrington-Leigh and Millard-Ball 2017).

In this session, which assumes you know how to have experience using QGIS, or R, you will learn how to:

  • Find data on travel behaviour from the 2011 Census and from the School Census
  • How to download and import data from the PCT into QGIS
  • How to process the data alongside infrastructure data, to help find gaps in the network

3.1 Getting PCT data

We will get PCT data at the MSOA level and plot the result in a simple map.

  • G1: The first stage is to load packages we will use:
library(pct)
library(sf)      # key package for working with spatial vector data
library(dplyr)   # in the tidyverse
library(tmap)    # installed alongside mapview
  • G2: After setting the region name (to avoid re-typing it many times) run the following commands to download data at the four main levels used in the PCT:
region_name = "isle-of-wight"
zones_all = get_pct_zones(region_name, geography = "msoa")
lines_all = get_pct_lines(region_name, geography = "msoa")
routes_all = get_pct_routes_fast(region_name, geography = "msoa")
rnet_all = get_pct_rnet(region_name)
  • G3: Check the downloads worked by plotting the result as follows:
plot(zones_all$geometry)
plot(lines_all$geometry, col = "blue", add = TRUE)
plot(routes_all$geometry, col = "green", add = TRUE)
plot(rnet_all$geometry, col = "red", lwd = sqrt(rnet_all$bicycle), add = TRUE)

3.2 Visualising PCT data

At its heart, the PCT is a data visualisation tool.

  • V1: Create a static plot showing the route network layer with the tmap package
tm_shape(rnet_all) +
  tm_lines(lwd = "dutch_slc", scale = 9)
  • V2: Create an interactive map showing the route network layer with the tmap package
# interactive plot
tmap_mode("view")
tm_shape(rnet_all) +
  tm_lines(lwd = "dutch_slc", scale = 9)
  • V3 (Bonus): Run the following command to show the % of short trips in Isle of Wight made by active modes.
# basic plot
max_distance = 7
# plot(zones_all$geometry)
# plot(lines_all$geometry[lines_all$all > 500], col = "red", add = TRUE)

# create 'active' desire lines (less than 5 km)
active = lines_all %>% 
  mutate(`Percent Active` = (bicycle + foot) / all * 100) %>% 
  filter(e_dist_km < max_distance)

tm_shape(active) +
  tm_lines("Percent Active", palette = "RdYlBu", lwd = "all", scale = 9)
  • V4 (Bonus): Use the same technique to to identify short distance trips with a high mode share by car, as follows:
# Create car dependent desire lines
car_dependent = lines_all %>% 
  mutate(`Percent Drive` = (car_driver) / all * 100) %>% 
  filter(e_dist_km < max_distance)
tm_shape(car_dependent) +
  tm_lines("Percent Drive", palette = "-RdYlBu", lwd = "all", scale = 9)

Advanced: visualise the PCT data using a range of visualisation techniques. For inspiration, check out the Making maps with R chapter of Geocomputation with R.

3.3 Exploring PCT data

  • E1: Using the PCT’s online interface, hosted at www.pct.bike/m/?r=isle-of-wight, identify the MSOA zone that has the highest number of people who cycle to work.

  • E2: Using data downloaded with the command get_pct_zones(), identify the zone that has highest level of cycling with the function top_n() and save the result as an object called z_highest_cycling (hint: you may want to start by ‘cleaning’ the data you have downloaded to include only a few key columns with the function select(), as follows):

z = zones_all %>% 
  select(geo_code, geo_name, all, foot, bicycle, car_driver)
  • E3: Use the plot() command to visualise where on the Isle of Wight this ‘high cycling’ zone is (hint: you will need to use the plot() function twice, once to plot z$geometry, and again with the argument add = TRUE and a col argument to add the layer on top of the base layer and give it a colour). The result should look something like something this:

  • E4: Using the online interface, identify the top 5 MSOA to MSOA desire lines that have the highest number of people who cycle to work.

  • E5: Using the function top_n(n = 5, wt = bicycle), identify the top 5 MSOA to MSOA desire lines that have the highest number of people who cycle to work (hint: you might want to start with the code shown below).

    • Bonus: also find the 5 desire lines with the highest number of people driving to work. Plot them and find the straight line distance of these lines with the function st_length().
# Aim: get top 5 cycle routes
l_msoa = lines_all %>% 
  select(geo_code1, geo_code2, all, foot, bicycle, car_driver, rf_avslope_perc, rf_dist_km)
  • E6 (Bonus): Repeat the exercise but calculate the top 10 LSOA to LSOA desire lines (by setting the argument geography = "lsoa", remember to change the names of the objects you create). The results should look like this:

  • E7: Why are the results different? What are the advantages and disadvantages of using smaller zones, as represented by the LSOA data above?

3.4 Modifying PCT data to identify routes/roads of interest

  • M1: Building on the MSOA examples above, add a new column called pcycle to the object l_msoa that contains the % who cycle to work (hint: you might want to start this by typing l_msoa$pcycle = ...) and plot the results (shown in left hand panel in plot below).
l_msoa$pcycle = l_msoa$bicycle / l_msoa$all * 100
# plot(l_msoa["pcycle"], lwd = l_msoa$all / mean(l_msoa$all), breaks = c(0, 5, 10, 20, 50))
  • M2 (bonus): identify road segments with the highest estimated number of people cycling currently, and under the Go Dutch scenario (hint: you can download the route network with get_pct_rnet("isle-of-wight"))

  • M3 Calculate the proportion of trips in the Isle of Wight that are less than 10 km in length and (bonus) plot the cumulative distribution graph for the fastest route distances

  • M4: subset and then plot all the MSOA-MSOA desire lines that have a route distance of less than 10 KM and more than 5 people travelling by mode, for each of the following modes:

    • walking
    • cycling
    • driving
plot(l_less_than_10km %>% filter(foot > 5) %>% select(foot))
plot(l_less_than_10km %>% filter(bicycle > 5) %>% select(bicycle))
plot(l_less_than_10km %>% filter(car_driver > 5) %>% select(car_driver))

This section is designed for people with experience with the PCT and cycling uptake estimates who want to learn more about how uptake models work.

4 PCT scenarios

  • S1: Generate a ‘Go Dutch’ scenario for the Isle of Wight using the function uptake_pct_godutch() (hint: the following code chunk will create a ‘Government Target’ scenario):
l_msoa$euclidean_distance = as.numeric(sf::st_length(l_msoa))
l_msoa$pcycle_govtarget = uptake_pct_govtarget_2020(
  distance = l_msoa$rf_dist_km,
  gradient = l_msoa$rf_avslope_perc
  ) * 100 + l_msoa$pcycle
  • S2: Think of alternative scenarios that would be useful for your work
  • S3 (bonus): look inside the function pct_uptake_godutch() - how could it be modified?

5 Routing and route networks

A key aspect of the PCT is routing. This section demonstrates how to calculate cycling routes in R, to support evidence-based transport planning.

5.1 Routes

  • R1: Using the function route() find the route associated with the most cycled desire line in the Isle of Wight. If you use the arguments route_fun = osrmRoute returnclass = "sf", the result should look similar to that displayed in the map below (hint: you may want to start your answer with the following lines of code):
l_top = l_msoa %>% 
  top_n(n = 1, wt = bicycle)
  • R2: What are the problems associated with this route from a cycling perspective? Take a look at the help page opened by entering ?route to identify the reason why the route is not particularly useful from a cycling perspective.

  • R3: Regenerate the route using the following command: route(l = l_top, route_fun = cyclestreets::journey). What is the difference in the length between each route, and what other differences can you spot? Note: this exercise requires an API Key from CycleStreets.net.

  • R4 (bonus): what features of a routing service would be most useful for your work and why?

5.2 Route networks

  • RN1: Generate a ‘route network’ showing number of people walking in the top 30 routes in the Isle of Wight, allocated to the transport network (hint: use the overline() function and begin the script as follows, the results should look similar to the results below):
route_data = sf::st_sf(wight_lines_30, geometry = wight_routes_30$geometry)
  • RN2: Download the travel to school route network and compare the results with the route network created for RN1.

    • Which roads have greatest overlap between the two route networks?
    • For more information on the travel to school layer, see Goodman et al. (2019). What other trip purposes would you like to see in tools for cycle planning?

6 Ideas for further work

  • Create a route network reflecting where you would invest if the priority was reducing car trips of less than 5 km
  • Design interventions to replace short car trips across London (or another region of your choice) using the PCT methods/data to support your decisions
  • Identify quiet routes and design a quiet route network for city/region of your choice, e.g. Westminter
  • Import alternative origin-destination datasets and use those as the basis for propensity to cycle analysis for trip purposes other than single stage commutes, encapsulated in the commut layer in the PCT
  • Any other layers/scenarios/hacks: welcome! Comments in this repo’s issue tracker also welcome.

7 Useful links

These links may be useful when working through the exercises:

https://github.com/ITSLeeds/pct/releases/download/v0.9.4/training-dec-2021.html

References

Barrington-Leigh, Christopher, and Adam Millard-Ball. 2017. “The World’s User-Generated Road Map Is More Than 80% Complete.” PLOS ONE 12 (8): e0180698. https://doi.org/10.1371/journal.pone.0180698.

Goodman, Anna, Ilan Fridman Rojas, James Woodcock, Rachel Aldred, Nikolai Berkoff, Malcolm Morgan, Ali Abbas, and Robin Lovelace. 2019. “Scenarios of Cycling to School in England, and Associated Health and Carbon Impacts: Application of the ‘Propensity to Cycle Tool’.” Journal of Transport & Health 12 (March): 263–78. https://doi.org/10.1016/j.jth.2019.01.008.

Lovelace, Robin. 2021. “Open Source Tools for Geographic Analysis in Transport Planning.” Journal of Geographical Systems, January. https://doi.org/10.1007/s10109-020-00342-2.

Lovelace, Robin, Anna Goodman, Rachel Aldred, Nikolai Berkoff, Ali Abbas, and James Woodcock. 2017. “The Propensity to Cycle Tool: An Open Source Online System for Sustainable Transport Planning.” Journal of Transport and Land Use 10 (1). https://doi.org/10.5198/jtlu.2016.862.

Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. 2019. Geocomputation with R. CRC Press. https://geocompr.robinlovelace.net/.

Woodcock, James, Rachel Aldred, Robin Lovelace, Tessa Strain, and Anna Goodman. 2021. “Health, Environmental and Distributional Impacts of Cycling Uptake: The Model Underlying the Propensity to Cycle Tool for England and Wales.” Journal of Transport & Health 22 (September): 101066. https://doi.org/10.1016/j.jth.2021.101066.