In This Guide
- Why Queries Get Slow and How Indexes Fix It
- How B-Tree Indexes Work
- Index Types: B-Tree, Hash, GIN, and More
- When to Add an Index (and When Not To)
- Composite Indexes: Column Order Matters
- Covering Indexes: The Ultimate Performance Trick
- Reading Query Plans with EXPLAIN
- Common Indexing Mistakes
- Frequently Asked Questions
Key Takeaways
- A well-placed index turns a sequential scan into a B-tree traversal — the difference between scanning 10 million rows and checking 23 nodes. This is where "100x faster" comes from.
- Indexes cost write performance. Every INSERT, UPDATE, and DELETE must update all indexes on the table. Index everything you need; index nothing you do not.
- Column order in composite indexes is not arbitrary. The index is only useful for queries that filter on a left-prefix of the indexed columns. Understanding this prevents hours of debugging mysterious slow queries.
- Use EXPLAIN ANALYZE before and after adding an index. Do not guess — measure. The query plan tells you exactly what the optimizer is doing.
The single most impactful performance optimization in most web applications is not caching, not horizontal scaling, and not query rewriting — it is adding the right database index. I have seen applications go from timing out under load to handling 10x the traffic with a single CREATE INDEX statement. Understanding indexes is not optional knowledge for backend developers — it is foundational.
Why Queries Get Slow and How Indexes Fix It
Without an index, finding rows that match a condition requires a full table scan: the database reads every row and checks whether it matches. For a table with 10 million rows, a query like SELECT * FROM orders WHERE customer_id = 12345 reads all 10 million rows, even though only a handful match. As the table grows, the query gets linearly slower.
An index on customer_id lets the database jump directly to the matching rows. The index contains a sorted structure that maps customer_id values to row locations. The database traverses this structure in O(log n) time — about 23 steps for 10 million rows — then jumps directly to the matching rows. The query that took seconds now takes milliseconds.
How B-Tree Indexes Work
The B-tree (balanced tree) is the default index type in PostgreSQL, MySQL, SQLite, and most other relational databases. It is a self-balancing tree that keeps all leaf nodes at the same depth, ensuring consistent O(log n) lookup regardless of which value you search for.
Structure of a B-tree index:
- Root node: The top of the tree. Contains key values that divide the key space into ranges, and pointers to child nodes.
- Internal nodes: Intermediate levels between root and leaves. Each contains key values and pointers to child nodes.
- Leaf nodes: The bottom level. Contains the actual indexed values and pointers (row IDs / heap tuple IDs) to the corresponding rows in the table.
- Linked leaf nodes: Leaf nodes are linked in order, enabling efficient range scans:
WHERE price BETWEEN 100 AND 500traverses the tree to find 100, then follows the leaf node links to 500.
This structure explains why B-tree indexes support equality lookups (=), range queries (<, >, BETWEEN), prefix matching (LIKE 'abc%'), and ORDER BY — but not suffix matching (LIKE '%abc'), which would require reading the entire leaf level.
Index Types: B-Tree, Hash, GIN, and More
B-Tree (Default)
Best for: equality comparisons, range queries, ordering, prefix searches. The right choice for 90% of indexes.
Hash
Best for: exact equality lookups only (=). Hash indexes can be faster than B-tree for equality-only lookups, but they do not support range queries, ordering, or prefix searches. PostgreSQL hash indexes are crash-safe since PostgreSQL 10. Limited use cases — B-tree is usually the better default.
GIN (Generalized Inverted Index)
Best for: indexing array columns, JSONB columns, and full-text search. A GIN index on a JSONB column lets you efficiently query nested JSON values: WHERE metadata->>'status' = 'active'. Essential for PostgreSQL applications that use JSONB extensively.
GiST (Generalized Search Tree)
Best for: geometric data types, full-text search (alternative to GIN), and network address types. Used for PostGIS geospatial queries.
Partial Indexes
A partial index includes only rows that satisfy a condition. CREATE INDEX ON orders (customer_id) WHERE status = 'pending' creates a smaller, faster index for the common query pattern of finding pending orders. The index is smaller (only pending orders) and the queries that use it are faster.
When to Add an Index (and When Not To)
Add an index when:
- The column appears frequently in WHERE clauses, JOIN conditions, or ORDER BY on queries that are slow
- The column has high cardinality (many distinct values). An index on a boolean column with only two values is rarely useful.
- The table has enough rows that a full scan is noticeably slow (typically 10,000+ rows, depending on the query)
- EXPLAIN shows a sequential scan (Seq Scan in PostgreSQL) on a large table
Do not add an index when:
- The table is small — full scans on small tables are fast, and index overhead is not worth it
- The column has low cardinality (status columns with 3–5 values, boolean columns) — the optimizer may choose a full scan anyway
- The table is write-heavy — each index adds overhead to every write operation
- The index would never be used based on actual query patterns
Composite Indexes: Column Order Matters
A composite index on (a, b, c) supports queries that filter on a, on (a, b), or on (a, b, c) — but not queries that only filter on b or c without a. This "left-prefix" rule is the most commonly misunderstood aspect of indexing.
-- Index: CREATE INDEX ON orders (customer_id, status, created_at) -- USES the index (matches left-prefix) SELECT * FROM orders WHERE customer_id = 123; SELECT * FROM orders WHERE customer_id = 123 AND status = 'shipped'; SELECT * FROM orders WHERE customer_id = 123 AND status = 'shipped' AND created_at > '2025-01-01'; -- DOES NOT use the index (skips customer_id) SELECT * FROM orders WHERE status = 'shipped'; SELECT * FROM orders WHERE created_at > '2025-01-01';
Column order in a composite index should reflect the most common query patterns: put the most selective filter first, then the next most selective, and so on. If queries filter equally often by different column combinations, consider separate indexes for each pattern.
Covering Indexes: The Ultimate Performance Trick
A covering index includes all columns needed to answer a query — filter conditions and selected columns — so the database never has to touch the main table. This eliminates the random I/O of row lookups entirely.
-- Common query SELECT email, name FROM users WHERE status = 'active' ORDER BY created_at; -- Non-covering index (still needs table access) CREATE INDEX ON users (status); -- Covering index (satisfies entire query from index) CREATE INDEX ON users (status, created_at, email, name); -- PostgreSQL supports INCLUDE columns for this pattern: CREATE INDEX ON users (status, created_at) INCLUDE (email, name);
PostgreSQL's INCLUDE clause (added in PG 11) allows adding columns to the leaf level of the index without including them in the B-tree structure — they are just along for the ride to support covering queries without affecting the B-tree ordering.
Reading Query Plans with EXPLAIN
-- PostgreSQL: see the query plan with actual execution times EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 12345 AND status = 'pending'; -- Look for: -- Seq Scan → full table scan (may need an index) -- Index Scan → using the index correctly -- Index Only Scan → covering index (best for reads) -- Bitmap Index Scan → multi-index scan merged -- Rows / actual rows mismatch → stale statistics (run ANALYZE)
Key metrics to examine in EXPLAIN ANALYZE: the node type (Seq Scan vs Index Scan), the estimated vs actual row counts (large discrepancies indicate stale statistics), and the total cost. A high-cost Seq Scan on a large table is the clearest signal that an index is needed.
Common Indexing Mistakes
- Indexing every column: Indexes slow down writes and consume disk space. Only index what queries actually need.
- Wrong column order in composite indexes: The left-prefix rule means a (b, a) index does not help a query that only filters on a.
- Not running EXPLAIN: Adding an index without checking whether the optimizer actually uses it. The optimizer may choose a sequential scan even with an index if the table is small or the predicate is not selective.
- Implicit type conversion prevents index use:
WHERE id = '123'where id is an integer may trigger a sequential scan because the type conversion prevents index matching. - Functions on indexed columns:
WHERE LOWER(email) = '[email protected]'does not use an index on email — it uses a full scan. Use a functional index:CREATE INDEX ON users (LOWER(email)).
Build database skills that scale to production. Hands-on training.
The Precision AI Academy bootcamp covers databases, data engineering, and AI integration. $1,490. October 2026. 40 seats per city.
Reserve Your SeatFrequently Asked Questions
How do database indexes work?
Most relational databases use B-tree indexes. The index stores a sorted copy of the indexed column values with pointers to corresponding rows. To find matching rows, the database traverses the B-tree in O(log n) steps rather than scanning every row in O(n). For 10 million rows, that is 23 steps vs 10 million.
When should you add a database index?
Add an index when a column is frequently used in WHERE clauses, JOIN conditions, or ORDER BY in slow queries; the column has high cardinality; and the table has enough rows that full scans are slow. Do not index every column — indexes slow writes and consume disk space.
What is a covering index?
A covering index includes all columns needed to satisfy a query — filter conditions and selected columns — so the database answers the query entirely from the index without accessing the main table. This eliminates the random I/O cost of row lookups and is the highest-performance optimization for read-heavy query patterns.
How do you identify which queries need indexes?
Use EXPLAIN ANALYZE (PostgreSQL) or EXPLAIN (MySQL) to see the query execution plan. Look for sequential scans on large tables. Enable slow query logging to capture queries exceeding a time threshold. pg_stat_statements in PostgreSQL tracks cumulative query statistics and is invaluable for identifying frequently executed slow queries.
Database performance is a career-defining skill. Master it.
Two days of hands-on training in databases, data engineering, and AI. $1,490. Denver, NYC, Dallas, LA, Chicago. October 2026.
Reserve Your SeatNote: Index behavior varies between database systems. Examples use PostgreSQL syntax. MySQL, SQLite, and other databases have similar concepts with different syntax details.