Day 3 of 5
⏱ ~60 minutes
R Programming in 5 Days — Day 3

Data Visualization with ggplot2

ggplot2 is the most powerful data visualization library in any language. Based on the Grammar of Graphics, it builds plots by layering components. Today you produce publication-quality charts and learn the principles behind them.

The Grammar of Graphics

ggplot2 maps data variables (aesthetics: x, y, color, size, shape) to geometric objects (geoms: points, lines, bars, boxes). Every plot starts with ggplot(data, aes(x=var, y=var)) then adds layers with +. geom_point() makes scatter plots; geom_line() line charts; geom_histogram() histograms; geom_col()/geom_bar() bar charts; geom_boxplot() box plots. Facets (facet_wrap, facet_grid) create small multiples.

Scales, Themes, and Labels

Scales control how data maps to visual properties: scale_color_brewer() uses ColorBrewer palettes (designed for clarity and accessibility), scale_x_log10() transforms axes, scale_fill_manual() sets custom colors. Themes control non-data elements: theme_minimal(), theme_classic(), theme_bw(). Labs(title, x, y, color) sets labels. theme() customizes individual elements: theme(legend.position='bottom').

Advanced ggplot2: Statistical Layers

stat_smooth() adds regression lines with confidence intervals. geom_violin() shows distribution shape. geom_density_2d() shows 2D density contours. geom_tile() makes heat maps. coord_flip() rotates axes for horizontal bar charts. patchwork library combines multiple plots. ggsave() exports to PDF, PNG, or SVG at any resolution — R produces print-ready graphics with no post-processing.

r
library(ggplot2)
library(dplyr)

# Scatter plot with regression line
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl))) +
  geom_point(size = 3, alpha = 0.8) +
  geom_smooth(method = 'lm', se = TRUE, aes(group = 1),
              color = 'black', linetype = 'dashed') +
  scale_color_brewer(palette = 'Set1', name = 'Cylinders') +
  labs(
    title    = 'Car Weight vs Fuel Efficiency',
    subtitle = 'Data: mtcars | Dashed line: linear fit',
    x        = 'Weight (1000 lbs)',
    y        = 'Miles per Gallon'
  ) +
  theme_minimal(base_size = 13)

# Save high-resolution
ggsave('mpg_plot.pdf', width = 8, height = 5, dpi = 300)

# Faceted bar chart
ggplot(diamonds, aes(x = cut, fill = clarity)) +
  geom_bar(position = 'fill') +
  facet_wrap(~color) +
  coord_flip() +
  labs(y = 'Proportion', title = 'Diamond Cut by Clarity and Color') +
  theme_bw()
💡
Save all ggplot2 graphics with ggsave() as PDF or SVG for scalable vector output. PNG at 300 DPI is fine for presentations. Never screenshot plots from the RStudio Viewer — the resolution will be poor.
📝 Day 3 Exercise
Build a Complete Analysis Dashboard
  1. Load the gapminder package: install.packages('gapminder'); library(gapminder)
  2. Create a scatter plot of GDP per capita vs life expectancy, colored by continent
  3. Add stat_smooth(method='loess') to show the trend
  4. Use facet_wrap(~year) to create small multiples for 5 different years
  5. Export to a PDF with ggsave() at 300 DPI

Day 3 Summary

  • ggplot2 builds plots by layering data, aesthetics, and geometric objects
  • Aesthetics (aes): x, y, color, fill, size, shape, alpha map data to visuals
  • Geoms: geom_point, geom_line, geom_col, geom_histogram, geom_boxplot
  • Facets create small multiples: facet_wrap (1D) and facet_grid (2D)
  • ggsave() exports print-ready PDF/PNG/SVG at any resolution
Challenge

Create a complete ggplot2 report with 4 publication-quality plots analyzing a dataset of your choice. Each plot should use a different geom type, have proper titles/axis labels, and use a colorblind-accessible palette.

Finished this lesson?