Transport Data Science

A module teaching how to use data science to solve transport problems.

1 R or Python?

A common question that arises when starting new digital projects is ‘which language should I use?’.

The answer for most people taking this course will be R, because it is the language we will be teaching in, and the language in which the module team has the most experience.

You can use Python, and we are working on Python examples, but the support for Python should be considered work in progress.

If you do choose to use Python, you will be expected to manage your own Python environment, and to be able to translate the R code examples into Python.

If you are feeling very adventurous, you could try using Julia or another language, but you will be on your own in terms of support.

2 Prerequisites

2.1 Hardware

Access to a computer that you have permission to install software on, with at least 8 GB of RAM, is highly recommended. You could use a cloud-based service such as RStudio Cloud, Google Colab, or GitHub Codespaces, but you would need to be comfortable with using these services and would miss out on some of the benefits of using your own computer.

2.2 Software

Although you are free to use any software for the course, the emphasis on reproducibility and interactive data science means that popular popular and established data science languages such as R and Python are recommended.

The teaching will be delivered primarily in R, with some Python code snippets and examples. Unless you have a good reason to use Python, we recommend you use R for the course.

2.2.1 R

You should have the latest stable release of R (4.3.0 or above) and RStudio (recommended) or another IDE such as VS Code (if you have prior experience with it) installed on your computer.

  • R from cran.r-project.org
  • RStudio from rstudio.com (recommended) or VS Code with the R extension installed.
  • R packages, which can be installed by opening RStudio and typing install.packages("stats19") in the R console, for example.
  • To install all the dependencies for the module, run the following command in the R console:
if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}
remotes::install_github("itsleeds/tds")

See Section 1.5 of the online guide Reproducible Road Safety Research with R for instructions on how to install key packages we will use in the module.1

2.2.2 Python

If you choose to use Python, you should be able to install it and manage your own Python environment, including installing packages and dealing with package conflicts. If you use Python we recommend using pixi, which can manage both R and Python environments.

2.3 Command-line experience

You should be comfortable with computing in general, for example creating folders, moving files, and installing software. You should be comfortable with using command line interfaces such as PowerShell in Windows, Terminal in macOS, or the Linux shell.

2.4 Data science experience prerequisites

Prior experience of using R or Python (e.g. having used it for work, in previous degrees or having completed an online course) is essential.

Students can demonstrate this by showing evidence that they have worked with R before, have completed an online course such as the first 4 sessions in the RStudio Primers series or DataCamp’s Free Introduction to R course.

Evidence of substantial programming and data science experience in previous professional or academic work, in languages such as R or Python, also constitutes sufficient pre-requisite knowledge for the course.

Footnotes

  1. For further guidance on setting-up your computer to run R and RStudio for spatial data, see these links, we recommend Chapter 2 of Geocomputation with R (the Prerequisites section contains links for installing spatial software on Mac, Linux and Windows): https://geocompr.robinlovelace.net/spatial-class.html and Chapter 2 of the online book Efficient R Programming, particularly sections 2.3 and 2.5, for details on R installation and set-up and the project management section.↩︎