Transport Data Science

This module focuses on applying data science techniques to solve real-world transport problems. Based at the University of Leeds’ Institute for Transport Studies (module code TRAN5340M), the course is led by Robin Lovelace, Professor of Transport Data Science and developer of several data-driven solutions for effective transport planning.

The course has evolved over a decade of teaching and research in the field. It aims to equip you with up-to-date and future-proof skills through practical examples and reproducible workflows using industry-standard data science tools.

Prerequisites

Before taking this course, we expect you to have some experience with programming, data science, and general computing. While experience with geographic data is helpful, it is not required.

Hardware

We highly recommend having access to a computer with at least 8 GB of RAM that you have permission to install software on.

Alternatively, you could use cloud-based services such as RStudio Cloud, Google Colab, or GitHub Codespaces. However, you would need to be comfortable using these services and may miss out on some benefits of using your own computer.

Computing experience

You should be comfortable with general computing tasks, such as:

  • Creating folders and managing files
  • Installing software
  • Using command line interfaces (PowerShell in Windows, Terminal in macOS, or Linux shell)

Data science experience prerequisites

Prior experience using R or Python is essential. This could include:

  • Using these languages in professional work
  • Experience from previous degrees
  • Completion of relevant online courses

Students can demonstrate this prerequisite knowledge by showing evidence they have:

Substantial programming and data science experience in previous professional or academic work using languages like R or Python also satisfies the prerequisite requirements.

Software requirements and installation

To participate in this course, you’ll need to install specific software.

The teaching is delivered primarily in R, with some Python code and examples. You should install R (recommended), Python, or both on your computer.

We recommend using R for practical sessions and coursework unless you have a specific reason to use Python. If you choose Python or another language, please note:

  • You will receive less direct support
  • You’ll need skills to set up and manage your own environments

We welcome translations of R code examples into other languages. Contributions to course materials via Pull Requests on GitHub are encouraged.

Quickstart with GitHub Codespaces

For a quick cloud-based setup, you can use GitHub Codespaces to access the course materials:

  1. Sign up to GitHub
  2. Fork the repository
  3. Click the “Open in GitHub Codespaces” button above

Alternatively, use the following link:

Open in GitHub Codespaces

Open in GitHub Codespaces

R

Install a recent version of R (4.3.0 or above) and an IDE:

  • R from cran.r-project.org
  • RStudio from rstudio.com (recommended)
  • Alternatively, VS Code with the R extension installed (if you have prior experience with it)

You’ll also need to install R packages:

  • Individual packages can be installed by opening RStudio and typing commands like install.packages("stats19") in the R console
  • To install all dependencies for the module at once, run the following command in the R console:
if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}
remotes::install_github("itsleeds/tds")

See Section 1.5 of the online guide Reproducible Road Safety Research with R for instructions on how to install key packages we will use in the module.1

Python

If you choose to use Python, you should be comfortable with:

  • Installing Python
  • Managing your own Python environment
  • Installing packages and resolving package conflicts

For Python users, we recommend using an environment manager such as:

  • pixi (which can manage both R and Python environments)
  • Docker (best practice for reproducibility and isolation)

Docker (advanced)

We maintain a Docker image containing all necessary software to complete the course with VS Code, Quarto, and a Devcontainer setup.

Advantages:

  • Ensures reproducibility
  • Saves time installing software

Disadvantages:

  • Docker can be challenging to install
  • Difficult to use if you’re unfamiliar with Docker

We recommend this approach only for people who are confident with Docker or willing to invest time learning it.

For guidance, see:

R, Python or other?

While the module focuses on methods implementable in many languages, we expect most participants will use R for practicals and the final course project.

We recommend R because:

  • It provides a data science environment with batteries included
  • It offers many mature packages for data manipulation, visualization, and statistical analysis
  • These packages are available within seconds without worrying about conflicts or environment management
  • The module team has the most experience with R

Python is another excellent choice for transport data science. Many of our R code examples have been ported to Python, as illustrated below. This example shows how to load the R package {sf} and its Python equivalent, {geopandas}:

library(sf)
geo_data = read_sf("geo_data.gpkg")
import geopandas as gpd
geo_data = gpd.read_file("geo_data.gpkg")

If you choose Python, you will need to:

  • Manage your own Python environment
  • Translate R code examples into Python

For the adventurous, you could try using:

  • Julia
  • JavaScript/TypeScript (e.g., via Observable)
  • Other programming languages

However, please note that support for these alternative languages will be limited.

Contributing to the course

See the README for instructions on how to contribute to the course materials.

Footnotes

  1. For further guidance on setting-up your computer to run R and RStudio for spatial data, see these links, we recommend Chapter 2 of Geocomputation with R (the Prerequisites section contains links for installing spatial software on Mac, Linux and Windows): https://geocompr.robinlovelace.net/spatial-class.html and Chapter 2 of the online book Efficient R Programming, particularly sections 2.3 and 2.5, for details on R installation and set-up and the project management section.↩︎