Welcome to the Fundamentals!
This section covers the essentials you need to get started with the concepts and tools for data science.
Integrated Development Environments (IDEs)
An IDE (Integrated Development Environment) is your toolkit for writing code efficiently.
RStudio
Popular IDE specifically designed for R programming
Source Editor (top-left): Write and edit R code
Console (bottom-left): Code execution and results
Environment/History (top-right): Variables and command history
Files/Plots/Packages/Help (bottom-right): File browser and help
Tip : Customize the layout via View → Panes → Pane Layout
VS Code
Versatile IDE supporting multiple languages including R and Python
Activity Bar (left): Switch between views (Explorer, Search, Source Control, Extensions)
Side Bar (left): File explorer and other views
Editor (left): Write your code
Console (bottom): Run commands and see output
Output : (right): Preview visual outputs/documents
Popular Extensions : Python, R, Pylance, Quarto
Positron
New IDE from Posit (makers of RStudio) supporting R and Python equally
Note : Currently in beta but shows great promise for bilingual data scientists
The fundamentals of R
how to organize your work
basic data types and structures
using R for calculations
Organizing Your Work
Project structure and file paths
Recommended Folder Structure
my-project/
├── data/
│ ├── raw/ # Original data files
│ └── processed/ # Cleaned data
├── code/
│ ├── analysis.R
│ └── plots.R
├── outputs/
│ ├── figures/
│ └── results/
├── README.md
└── my-project.Rproj
Tips:
Keep your work organised for easy maintenance in project folders.
Use RStudio projects (.Rproj files) to manage your R work.
Use meaningful folder and file names.
Keep data separate from code and outputs.
Keep raw data unchanged; process copies instead.
Working with Paths in R
Relative paths (from current working directory):
# Set working directory
setwd ("C:/Users/Alice/my-project" )
# Read data using relative path
data <- read.csv ("data/raw/mydata.csv" )
Absolute paths (full path from root):
# Read using absolute path
data <- read.csv ("C:/Users/Alice/my-project/data/raw/mydata.csv" )
Best practice : Avoid using absolute paths when working locally.
Data Types in R
# Numeric, Integer, Character, Logical
x <- 42 # numeric
y <- 42 L # integer
name <- "Alice" # character
flag <- TRUE # logical
class (x)
typeof (x)
Key Types :
numeric: Real numbers
integer: Whole numbers only
character: Text strings
logical: TRUE or FALSE
Vectors in R
Sequences of values of the same type:
numbers <- c (1 , 2 , 3 , 4 , 5 )
names <- c ("Alice" , "Bob" , "Charlie" )
flags <- c (TRUE , FALSE , TRUE )
first_number <- numbers[1 ]
first_three <- numbers[1 : 3 ]
Data Frames in R
Tables with rows and columns:
students <- data.frame (
name = c ("Alice" , "Bob" , "Charlie" ),
age = c (20 , 21 , 19 ),
grade = c ("A" , "B" , "A" )
)
students$ name # access column
students[1 , ] # access row
head (students) # view data
Lists in R
Flexible containers for mixed types:
my_list <- list (
name = "Alice" ,
age = 20 ,
scores = c (85 , 90 , 88 ),
data = data.frame (x = 1 : 3 , y = 4 : 6 )
)
my_list$ name
my_list[[1 ]]
R as a Calculator
Simple operations:
2 + 3 # Addition
10 - 4 # Subtraction
5 * 6 # Multiplication
20 / 4 # Division
2 ^ 3 # Exponentiation
10 %% 3 # Modulo
See: Arithmetic Operators
Main Operators in R
Two main operators you’ll use often:
Assignment: <- or = to assign values to variables
# Assignment
x <- 10
y = 20
sum <- x + y
Pipe: |> to chain commands (introduced in R 4.1.0) read it as “then”
result <- c (1 , 2 , 3 , 4 , 5 ) |> # create a vector `then`
sum () |> # calculate sum `then`
sqrt () # calculate square root
print (result)
Subsetting data
Extract specific elements from data structures:
vec <- c (10 , 20 , 30 , 40 , 50 )
first_element <- vec[1 ] # 10
subset <- vec[2 : 4 ] # 20, 30, 40
df <- data.frame (
a = 1 : 5 ,
b = 6 : 10 ,
c = letters[1 : 5 ])
df$ a # access column `a`
df[1 , ] # access first row
More on subsetting: Advanced R
Control Flow
Making decisions and repeating tasks
If Statements in R
Execute code conditionally:
age <- 25
if (age >= 18 ) {
print ("Adult" )
} else {
print ("Minor" )
}
# Multiple conditions
if (age < 13 ) {
category <- "Child"
} else if (age < 18 ) {
category <- "Teen"
} else {
category <- "Adult"
}
For Loops in R
Repeat code multiple times:
# Simple loop
for (i in 1 : 5 ) {
print (i)
}
# Loop over vector
fruits <- c ("apple" , "banana" , "cherry" )
for (fruit in fruits) {
print (paste ("I like" , fruit))
}
# Store results
results <- numeric (5 )
for (i in 1 : 5 ) {
results[i] <- i ^ 2
}
results # 1, 4, 9, 16, 25
Using Packages in R
Packages are collections of functions to extend capabilities: Different types of data/sources, different methods, more efficient coding, etc.
Source: Storybench
Installing and Loading Packages
# Install once
install.packages ("tidyverse" )
# Load each session
# This usually goes at the top of your script/document
library (tidyverse)
Finding Documentation
?mean # Help on mean function
help ("lm" ) # Help on lm function
example ("plot" ) # Examples for plot function
Key Takeaways
✅ Know your IDE (RStudio or VS Code)
✅ Understand basic data types in R
✅ Know the difference between data structures
✅ Organize your work with folder structure
✅ Use relative paths for portability
✅ Control program flow with if statements and loops
✅ Learn how to install and use packages
✅ Know where to find documentation
Python Content (Optional)
If you’re interested in Python, here are equivalent concepts
The Fundamentals of Python
Organising Your Work
Project structure and file paths
Recommended Folder Structure
my-project/
├── data/
│ ├── raw/ # Original data files
│ └── processed/ # Cleaned data
├── code/
│ ├── analysis.py
│ └── plots.py
├── outputs/
│ ├── figures/
│ └── results/
├── README.md
└── requirements.txt
Tips:
Keep your work organised for easy maintenance
Use meaningful folder and file names
Keep data separate from code and outputs
Keep raw data unchanged; process copies instead
Working with Paths in Python
Relative paths (from current working directory):
import os
# Change working directory
os.chdir("C:/Users/Alice/my-project" )
# Read data using relative path
import pandas as pd
data = pd.read_csv("data/raw/mydata.csv" )
Absolute paths (full path from root):
data = pd.read_csv("C:/Users/Alice/my-project/data/raw/mydata.csv" )
Best practice : Avoid using absolute paths when working locally.
Data Types in Python
x = 42 # int
y = 3.14 # float
name = "Alice" # str
flag = True # bool
type (x)
type (name)
Key Types :
int: Integers
float: Decimal numbers
str: Text strings
bool: True or False
Lists in Python
Ordered collections (can be mixed types):
numbers = [1 , 2 , 3 , 4 , 5 ]
names = ["Alice" , "Bob" , "Charlie" ]
mixed = [1 , "Alice" , 3.14 , True ]
first = numbers[0 ] # 0-based indexing!
first_three = numbers[0 :3 ]
Dictionaries in Python
Key-value pairs:
student = {
"name" : "Alice" ,
"age" : 20 ,
"grade" : "A"
}
student["name" ]
student.get("age" )
NumPy Arrays
Similar to Python lists but more efficient for numerical operations:
import numpy as np
numbers = np.array([1 , 2 , 3 , 4 , 5 ])
matrix = np.array([[1 , 2 , 3 ], [4 , 5 , 6 ]])
first = numbers[0 ]
Pandas DataFrames
Tables with rows and columns:
import pandas as pd
students = pd.DataFrame({
"name" : ["Alice" , "Bob" , "Charlie" ],
"age" : [20 , 21 , 19 ],
"grade" : ["A" , "B" , "A" ]
})
students["name" ] # access column
students.iloc[0 , :] # access row
students.head() # view data
Python as a Calculator
Simple operations:
2 + 3 # Addition
10 - 4 # Subtraction
5 * 6 # Multiplication
20 / 4 # Division
2 ** 3 # Exponentiation
10 % 3 # Modulo
See: Python Operators
Main Operators in Python
Key operators you’ll use regularly:
Assignment: = to assign values to variables
# Assignment
x = 10
y = 20
total = x + y
Method chaining with . to chain operations
result = [1 , 2 , 3 , 4 , 5 ] # create a list
sum_result = sum (result) # calculate sum
sqrt_result = sum_result ** 0.5 # calculate square root
print (sqrt_result)
Subsetting Data
Extract specific elements from data structures:
lst = [10 , 20 , 30 , 40 , 50 ]
first_element = lst[0 ] # 10
subset = lst[1 :4 ] # [20, 30, 40]
df = pd.DataFrame({
'a' : range (1 , 6 ),
'b' : range (6 , 11 ),
'c' : list ('abcde' )})
df['a' ] # access column `a`
df.iloc[0 , :] # access first row
More on subsetting here: NumPy Indexing
Control Flow
Making decisions and repeating tasks
If Statements in Python
Execute code conditionally:
age = 25
if age >= 18 :
print ("Adult" )
else :
print ("Minor" )
# Multiple conditions
if age < 13 :
category = "Child"
elif age < 18 :
category = "Teen"
else :
category = "Adult"
For Loops in Python
Repeat code multiple times:
# Simple loop
for i in range (1 , 6 ):
print (i)
# Loop over list
fruits = ["apple" , "banana" , "cherry" ]
for fruit in fruits:
print (f"I like { fruit} " )
# Store results
results = []
for i in range (1 , 6 ):
results.append(i ** 2 )
print (results) # [1, 4, 9, 16, 25]
Using Packages in Python
Packages are collections of functions to extend capabilities: Different types of data/sources, different methods, more efficient coding, etc.
Installing and Importing Packages
# Install via pip (run in terminal)
# pip install pandas
# Import in your script
# This usually goes at the top of your script/notebook
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
Finding Documentation
help (sum ) # Help on sum function
help (pd.read_csv) # Help on pandas read_csv
?np.array # In Jupyter notebooks