R was built by statisticians for statistical computing. It is the dominant language in academia and data science for statistical analysis, hypothesis testing, and publication-quality visualization. Today you learn R's unusual data model and core operations.
In R, everything is a vector. A single number is a vector of length 1. Operations apply element-wise: c(1,2,3) * 2 returns c(2,4,6). This vectorized design means most operations require no explicit loops — they work on entire vectors at once, which is both concise and fast (the underlying C code handles the loop). The <- operator assigns; = works too but <- is idiomatic R.
Atomic types: numeric (double), integer (1L), character ('hello'), logical (TRUE/FALSE), complex, and raw. Data structures: vector (1D, same type), matrix (2D, same type), list (1D, mixed types), data frame (2D, columns can be different types), and factor (categorical variable with levels). The is.*() family tests types; as.*() coerces. NA represents missing values; NULL represents the absence of an object.
R has three subsetting operators: [ (returns same type, multiple elements), [[ (returns single element, drops container), and $ (named element, same as [[name]]). Subsetting with a logical vector extracts matching elements: x[x > 5] returns values greater than 5. Negative indexing excludes: x[-1] drops the first element. Data frame columns: df$col, df[,'col'], df[[1]].
# Vectors are the foundation
x <- c(1, 4, 9, 16, 25)
cat('Square roots:', sqrt(x), '\n')
# Vectorized operations (no loop needed)
temps_c <- c(0, 20, 37, 100)
temps_f <- temps_c * 9/5 + 32
cat('Fahrenheit:', temps_f, '\n') # 32 68 98.6 212
# Data frame basics
df <- data.frame(
name = c('Alice', 'Bob', 'Carol', 'Dave'),
score = c(92, 78, 95, 85),
grade = c('A', 'C', 'A', 'B'),
stringsAsFactors = FALSE
)
# Subsetting
df[df$score >= 90, ] # rows where score >= 90
df[df$grade == 'A', 'name'] # names of A students
subset(df, score > 80, select = c(name, score))
# Summary statistics
cat('Mean:', mean(df$score), '\n')
cat('SD: ', sd(df$score), '\n')
summary(df)
Load the built-in 'iris' dataset. Compute the mean and SD of all four numeric columns grouped by Species. Identify which species has the largest average petal length and which has the smallest variation in sepal width.