Day 2 of 5
⏱ ~50 minutes
PostgreSQL for Developers — Day 2

JOINs and Indexes: The Skills That Make Databases Fast

Single-table queries are simple. Real applications have dozens of related tables. JOINs let you query them together. Indexes make those queries fast even with millions of rows. Both are essential.

Understanding Table Relationships

Before JOINs make sense, you need related tables. Here's a schema we'll use throughout this lesson:

sql
CREATE TABLE customers (
    id      SERIAL PRIMARY KEY,
    name    VARCHAR(100) NOT NULL,
    email   VARCHAR(100) UNIQUE NOT NULL,
    plan    VARCHAR(20)  DEFAULT 'free'
);

CREATE TABLE orders (
    id          SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    amount      NUMERIC(10,2) NOT NULL,
    status      VARCHAR(20)   DEFAULT 'pending',
    created_at  TIMESTAMP     DEFAULT NOW()
);

CREATE TABLE order_items (
    id          SERIAL PRIMARY KEY,
    order_id    INTEGER REFERENCES orders(id),
    product     VARCHAR(100),
    quantity    INTEGER,
    unit_price  NUMERIC(10,2)
);

-- The REFERENCES keyword creates a foreign key constraint
-- PostgreSQL will reject an order with a customer_id that doesn't exist

INNER JOIN — Match Rows That Exist in Both Tables

sql
-- Get orders with customer names
SELECT
    o.id          AS order_id,
    c.name        AS customer_name,
    c.email,
    o.amount,
    o.status
FROM orders o
INNER JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'completed'
ORDER BY o.created_at DESC;

-- INNER JOIN only returns rows where the join condition matches
-- Orders without a matching customer are excluded

LEFT JOIN — Keep All Rows From the Left Table

sql
-- All customers, even those with no orders
SELECT
    c.name,
    c.plan,
    COUNT(o.id)     AS total_orders,
    SUM(o.amount)   AS total_spent
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name, c.plan
ORDER BY total_spent DESC NULLS LAST;

-- Customers with no orders have NULL for total_orders and total_spent
-- Use COALESCE to replace NULL:
SELECT
    c.name,
    COALESCE(COUNT(o.id), 0)       AS total_orders,
    COALESCE(SUM(o.amount), 0.00)  AS total_spent
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name;

Multi-Table JOINs

sql
-- Orders with customer info and line items
SELECT
    c.name        AS customer,
    o.id          AS order_id,
    o.status,
    oi.product,
    oi.quantity,
    oi.unit_price,
    (oi.quantity * oi.unit_price) AS line_total
FROM orders o
INNER JOIN customers c    ON c.id = o.customer_id
INNER JOIN order_items oi ON oi.order_id = o.id
WHERE o.status = 'completed'
ORDER BY o.id, oi.product;

Indexes — Making Queries Fast

Without indexes, PostgreSQL scans every row to find matches. With the right indexes, it jumps directly to the rows it needs. The difference is dramatic at scale.

sql
-- Check if a query uses an index (EXPLAIN ANALYZE)
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;

-- Create indexes on columns you filter or JOIN on
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_status       ON orders(status);
CREATE INDEX idx_orders_created_at   ON orders(created_at DESC);

-- Composite index for queries that filter on multiple columns
CREATE INDEX idx_orders_status_date
ON orders(status, created_at DESC);

-- Unique index (also enforces uniqueness)
CREATE UNIQUE INDEX idx_customers_email ON customers(email);

-- See all indexes on a table
\d orders
💡
Index the columns you JOIN and WHERE on. Foreign key columns (customer_id, order_id) are the most common candidates. Don't over-index — indexes slow down INSERT/UPDATE because they need to be maintained.

EXPLAIN ANALYZE — See What PostgreSQL Is Doing

sql
-- Run this before and after adding an index
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 42 AND status = 'completed';

-- Look for:
-- "Seq Scan" = full table scan (slow on large tables)
-- "Index Scan" = using an index (fast)
-- "rows=N" = estimated rows
-- "actual time=N..N" = actual execution time in ms
📝 Exercise
Build a Normalized Schema With JOINs
  1. Create the customers, orders, and order_items tables from this lesson.
  2. Insert 3 customers, 5 orders (spread across customers), and 10 order items.
  3. Write a LEFT JOIN query showing all customers and their total order count (including customers with 0 orders).
  4. Write a multi-table JOIN showing each order with its customer name and total item count.
  5. Run EXPLAIN ANALYZE on a query filtering by customer_id.
  6. Add an index on orders.customer_id and run EXPLAIN ANALYZE again. Note the difference.

Lesson Summary

  • INNER JOIN returns only matching rows. LEFT JOIN keeps all left-table rows, filling NULLs for unmatched right-table rows.
  • Always alias tables in multi-JOIN queries for readability: FROM orders o.
  • Foreign key columns and frequently-filtered columns are the best candidates for indexes.
  • EXPLAIN ANALYZE shows the query plan and actual execution time — use it to find slow queries.
Challenge

Find all customers who have placed more than 2 orders totaling more than $500. Write it as a single query using JOINs and HAVING (not WHERE — HAVING filters after GROUP BY).

Finished this lesson?