R's core strength is statistical analysis. Today you apply hypothesis tests, build regression models, and use the broom package to work with model outputs in a tidy, consistent way.
t.test() compares means: t.test(x, y) tests if two groups have different means. prop.test() compares proportions. chisq.test() tests independence of categorical variables. cor.test() tests correlation significance. The output includes the test statistic, p-value, confidence interval, and degrees of freedom. R's output is verbose and human-readable but not tidy — that is what broom is for.
lm(y ~ x1 + x2, data=df) fits a linear model. summary(model) shows coefficients, standard errors, p-values, and R-squared. Interaction terms: y ~ x1 * x2. Polynomial terms: y ~ poly(x, 2). predict(model, newdata) makes predictions. Regression diagnostics: plot(model) shows residuals, Q-Q plot, scale-location, and Cook's distance. Check assumptions: linearity, homoscedasticity, normality of residuals.
broom converts model output to tidy data frames. tidy(model) returns one row per coefficient with estimate, std.error, statistic, p.value. glance(model) returns one row of model-level statistics: R-squared, AIC, BIC, F-statistic. augment(model, data) adds fitted values and residuals to the original data. This makes it easy to plot model results with ggplot2 and compare multiple models with dplyr.
library(broom)
# t-test: do groups differ?
group_a <- c(82, 85, 88, 92, 95, 78, 90)
group_b <- c(74, 78, 82, 79, 85, 71, 83)
test_result <- t.test(group_a, group_b)
tidy(test_result) # tidy data frame output
# estimate p.value conf.low conf.high
# 9.71 0.0087 2.85 16.57
# Linear regression
model <- lm(mpg ~ wt + hp + factor(cyl),
data = mtcars)
tidy(model) # coefficient table
glance(model) # R2, AIC, F-statistic
# Predict new observations
new_cars <- data.frame(wt = c(2.5, 3.0, 3.5),
hp = c(100, 150, 200),
cyl = c(4, 6, 8))
predict(model, newdata = new_cars,
interval = 'confidence')
# Visualize model
augment(model) |>
ggplot(aes(.fitted, .resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = 'dashed') +
labs(title = 'Residuals vs Fitted')
Analyze the built-in 'Boston' housing dataset from MASS package. Fit multiple regression models adding predictors one at a time. Compare models with AIC/BIC using broom::glance(). Identify the most parsimonious model that explains the most variance in median home values.