Single-table queries are simple. Real applications have dozens of related tables. JOINs let you query them together. Indexes make those queries fast even with millions of rows. Both are essential.
Before JOINs make sense, you need related tables. Here's a schema we'll use throughout this lesson:
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
plan VARCHAR(20) DEFAULT 'free'
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
amount NUMERIC(10,2) NOT NULL,
status VARCHAR(20) DEFAULT 'pending',
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INTEGER REFERENCES orders(id),
product VARCHAR(100),
quantity INTEGER,
unit_price NUMERIC(10,2)
);
-- The REFERENCES keyword creates a foreign key constraint
-- PostgreSQL will reject an order with a customer_id that doesn't exist
-- Get orders with customer names
SELECT
o.id AS order_id,
c.name AS customer_name,
c.email,
o.amount,
o.status
FROM orders o
INNER JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'completed'
ORDER BY o.created_at DESC;
-- INNER JOIN only returns rows where the join condition matches
-- Orders without a matching customer are excluded
-- All customers, even those with no orders
SELECT
c.name,
c.plan,
COUNT(o.id) AS total_orders,
SUM(o.amount) AS total_spent
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name, c.plan
ORDER BY total_spent DESC NULLS LAST;
-- Customers with no orders have NULL for total_orders and total_spent
-- Use COALESCE to replace NULL:
SELECT
c.name,
COALESCE(COUNT(o.id), 0) AS total_orders,
COALESCE(SUM(o.amount), 0.00) AS total_spent
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name;
-- Orders with customer info and line items
SELECT
c.name AS customer,
o.id AS order_id,
o.status,
oi.product,
oi.quantity,
oi.unit_price,
(oi.quantity * oi.unit_price) AS line_total
FROM orders o
INNER JOIN customers c ON c.id = o.customer_id
INNER JOIN order_items oi ON oi.order_id = o.id
WHERE o.status = 'completed'
ORDER BY o.id, oi.product;
Without indexes, PostgreSQL scans every row to find matches. With the right indexes, it jumps directly to the rows it needs. The difference is dramatic at scale.
-- Check if a query uses an index (EXPLAIN ANALYZE)
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;
-- Create indexes on columns you filter or JOIN on
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_created_at ON orders(created_at DESC);
-- Composite index for queries that filter on multiple columns
CREATE INDEX idx_orders_status_date
ON orders(status, created_at DESC);
-- Unique index (also enforces uniqueness)
CREATE UNIQUE INDEX idx_customers_email ON customers(email);
-- See all indexes on a table
\d orders
customer_id, order_id) are the most common candidates. Don't over-index — indexes slow down INSERT/UPDATE because they need to be maintained.-- Run this before and after adding an index
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 42 AND status = 'completed';
-- Look for:
-- "Seq Scan" = full table scan (slow on large tables)
-- "Index Scan" = using an index (fast)
-- "rows=N" = estimated rows
-- "actual time=N..N" = actual execution time in ms
customers, orders, and order_items tables from this lesson.EXPLAIN ANALYZE on a query filtering by customer_id.orders.customer_id and run EXPLAIN ANALYZE again. Note the difference.FROM orders o.EXPLAIN ANALYZE shows the query plan and actual execution time — use it to find slow queries.Find all customers who have placed more than 2 orders totaling more than $500. Write it as a single query using JOINs and HAVING (not WHERE — HAVING filters after GROUP BY).