Data Wrangling - Tasks
In this section, we will practice data wrangling using Python and the pandas package. Follow the steps below to manipulate and explore a dataset.
Create a new Python script in your
scriptsfolder and name it2-data-wrangling-and-visualization.py.Import pandas by adding the following line at the top of your script:
import pandas as pdDownload the
crashesfile (here) and save it in yourdatafolder.Read the
crashesdataset into Python using thepd.read_csv()function and assign it to a variable namedcrashes:crashes = pd.read_csv("data/crashes.csv")Explore the dataset:
- Use the
head()method to view the first few rows of the dataset. - Use the
info()method to understand the structure of the dataset. - Use the
describe()method to get a summary of the dataset.
- Use the
Data Wrangling Tasks:
- Create a new DataFrame named
crashes_filteredthat includes only cyclists. - Create a new DataFrame named
crashes_darkthat includes only crashes that occurred in dark conditions. - Create a new DataFrame named
crashes_dark_cyclistthat includes only crashes that involved cyclists and occurred in dark conditions. - Create a summary table named
crashes_by_typethat shows the median age by casualty type.
- Create a new DataFrame named
R version: If you prefer R, the same exercises are available in R. Create a new R script (2-data-wrangling-and-visualization.R), load the tidyverse package, and use read_csv(), filter(), select(), group_by(), and summarise() to complete the same tasks.