if (!requireNamespace("remotes", quietly = TRUE)) {
install.packages("remotes")
}::install_github("itsleeds/tds") remotes
Transport Data Science
This module focuses on applying data science techniques to solve real-world transport problems. Based at the University of Leeds’ Institute for Transport Studies (module code TRAN5340M), the course is led by Robin Lovelace, Professor of Transport Data Science and developer of several data-driven solutions for effective transport planning.
The course has evolved over a decade of teaching and research in the field. It aims to equip you with up-to-date and future-proof skills through practical examples and reproducible workflows using industry-standard data science tools.
Prerequisites
Before taking this course, we expect you to have some experience with programming, data science, and general computing. While experience with geographic data is helpful, it is not required.
Hardware
We highly recommend having access to a computer with at least 8 GB of RAM that you have permission to install software on.
Alternatively, you could use cloud-based services such as RStudio Cloud, Google Colab, or GitHub Codespaces. However, you would need to be comfortable using these services and may miss out on some benefits of using your own computer.
Computing experience
You should be comfortable with general computing tasks, such as:
- Creating folders and managing files
- Installing software
- Using command line interfaces (PowerShell in Windows, Terminal in macOS, or Linux shell)
Data science experience prerequisites
Prior experience using R or Python is essential. This could include:
- Using these languages in professional work
- Experience from previous degrees
- Completion of relevant online courses
Students can demonstrate this prerequisite knowledge by showing evidence they have:
- Worked with R previously
- Completed online courses such as the first 4 sessions in the RStudio Primers series
- Completed DataCamp’s Free Introduction to R course
Substantial programming and data science experience in previous professional or academic work using languages like R or Python also satisfies the prerequisite requirements.
Software requirements and installation
To participate in this course, you’ll need to install specific software.
The teaching is delivered primarily in R, with some Python code and examples. You should install R (recommended), Python, or both on your computer.
We recommend using R for practical sessions and coursework unless you have a specific reason to use Python. If you choose Python or another language, please note:
- You will receive less direct support
- You’ll need skills to set up and manage your own environments
We welcome translations of R code examples into other languages. Contributions to course materials via Pull Requests on GitHub are encouraged.
Quickstart with GitHub Codespaces
For a quick cloud-based setup, you can use GitHub Codespaces to access the course materials:
- Sign up to GitHub
- Fork the repository
- Click the “Open in GitHub Codespaces” button above
Alternatively, use the following link:
R
Install a recent version of R (4.3.0 or above) and an IDE:
- R from cran.r-project.org
- RStudio from rstudio.com (recommended)
- Alternatively, VS Code with the R extension installed (if you have prior experience with it)
You’ll also need to install R packages:
- Individual packages can be installed by opening RStudio and typing commands like
install.packages("stats19")
in the R console - To install all dependencies for the module at once, run the following command in the R console:
See Section 1.5 of the online guide Reproducible Road Safety Research with R for instructions on how to install key packages we will use in the module.1
Python
If you choose to use Python, you should be comfortable with:
- Installing Python
- Managing your own Python environment
- Installing packages and resolving package conflicts
For Python users, we recommend using an environment manager such as:
pixi
(which can manage both R and Python environments)- Docker (best practice for reproducibility and isolation)
Docker (advanced)
We maintain a Docker image containing all necessary software to complete the course with VS Code, Quarto, and a Devcontainer setup.
Advantages:
- Ensures reproducibility
- Saves time installing software
Disadvantages:
- Docker can be challenging to install
- Difficult to use if you’re unfamiliar with Docker
We recommend this approach only for people who are confident with Docker or willing to invest time learning it.
For guidance, see:
R, Python or other?
While the module focuses on methods implementable in many languages, we expect most participants will use R for practicals and the final course project.
We recommend R because:
- It provides a data science environment with batteries included
- It offers many mature packages for data manipulation, visualization, and statistical analysis
- These packages are available within seconds without worrying about conflicts or environment management
- The module team has the most experience with R
Python is another excellent choice for transport data science. Many of our R code examples have been ported to Python, as illustrated below. This example shows how to load the R package {sf}
and its Python equivalent, {geopandas}
:
library(sf)
= read_sf("geo_data.gpkg") geo_data
import geopandas as gpd
= gpd.read_file("geo_data.gpkg") geo_data
If you choose Python, you will need to:
- Manage your own Python environment
- Translate R code examples into Python
For the adventurous, you could try using:
- Julia
- JavaScript/TypeScript (e.g., via Observable)
- Other programming languages
However, please note that support for these alternative languages will be limited.
Contributing to the course
See the README for instructions on how to contribute to the course materials.
Footnotes
For further guidance on setting-up your computer to run R and RStudio for spatial data, see these links, we recommend Chapter 2 of Geocomputation with R (the Prerequisites section contains links for installing spatial software on Mac, Linux and Windows): https://geocompr.robinlovelace.net/spatial-class.html and Chapter 2 of the online book Efficient R Programming, particularly sections 2.3 and 2.5, for details on R installation and set-up and the project management section.↩︎