Day 1 of 5
⏱ ~60 minutes
R Programming in 5 Days — Day 1

R Fundamentals & Data Types

R was built by statisticians for statistical computing. It is the dominant language in academia and data science for statistical analysis, hypothesis testing, and publication-quality visualization. Today you learn R's unusual data model and core operations.

R's Vector-First Design

In R, everything is a vector. A single number is a vector of length 1. Operations apply element-wise: c(1,2,3) * 2 returns c(2,4,6). This vectorized design means most operations require no explicit loops — they work on entire vectors at once, which is both concise and fast (the underlying C code handles the loop). The <- operator assigns; = works too but <- is idiomatic R.

Core Data Types

Atomic types: numeric (double), integer (1L), character ('hello'), logical (TRUE/FALSE), complex, and raw. Data structures: vector (1D, same type), matrix (2D, same type), list (1D, mixed types), data frame (2D, columns can be different types), and factor (categorical variable with levels). The is.*() family tests types; as.*() coerces. NA represents missing values; NULL represents the absence of an object.

Subsetting

R has three subsetting operators: [ (returns same type, multiple elements), [[ (returns single element, drops container), and $ (named element, same as [[name]]). Subsetting with a logical vector extracts matching elements: x[x > 5] returns values greater than 5. Negative indexing excludes: x[-1] drops the first element. Data frame columns: df$col, df[,'col'], df[[1]].

r
# Vectors are the foundation
x <- c(1, 4, 9, 16, 25)
cat('Square roots:', sqrt(x), '\n')

# Vectorized operations (no loop needed)
temps_c <- c(0, 20, 37, 100)
temps_f <- temps_c * 9/5 + 32
cat('Fahrenheit:', temps_f, '\n')  # 32 68 98.6 212

# Data frame basics
df <- data.frame(
  name  = c('Alice', 'Bob', 'Carol', 'Dave'),
  score = c(92, 78, 95, 85),
  grade = c('A', 'C', 'A', 'B'),
  stringsAsFactors = FALSE
)

# Subsetting
df[df$score >= 90, ]           # rows where score >= 90
df[df$grade == 'A', 'name']   # names of A students
subset(df, score > 80, select = c(name, score))

# Summary statistics
cat('Mean:', mean(df$score), '\n')
cat('SD:  ', sd(df$score), '\n')
summary(df)
💡
Use <- for assignment, not =. While = works, <- is the universal R convention and makes the code's intent clear to any R programmer reading it. RStudio shortcut: Alt+- types <-.
📝 Day 1 Exercise
Explore Your First Dataset
  1. Install R from r-project.org and RStudio from posit.co
  2. Load the built-in mtcars dataset: data(mtcars); head(mtcars)
  3. Find the mean mpg, median hp, and standard deviation of wt
  4. Subset to only cars with mpg > 20 and more than 4 cylinders
  5. Use table() to count how many cars have each number of cylinders

Day 1 Summary

  • Everything in R is a vector — even a single number
  • Operations apply element-wise to entire vectors without explicit loops
  • Data frames are the primary tabular data structure — columns can be different types
  • Subsetting: [ returns same type; [[ returns single element; $ accesses named columns
  • NA represents missing values; use is.na() and na.rm=TRUE to handle them
Challenge

Load the built-in 'iris' dataset. Compute the mean and SD of all four numeric columns grouped by Species. Identify which species has the largest average petal length and which has the smallest variation in sepal width.

Finished this lesson?