Database Indexing Guide: Speed Up Your Queries 100x

Database indexing explained: how B-tree indexes work, when indexes help and when they hurt, how to diagnose slow queries, composite indexes, covering indexes, and query plan analysis.

PRIMARY DB REPLICA 1 REPLICA 2 SELECT * FROM data WHERE id = $1
10-100x
Query speedup
B-tree
Default index type
1
Rule: index queries
2026
Essential DBA skill

In This Guide

  1. Why Queries Get Slow and How Indexes Fix It
  2. How B-Tree Indexes Work
  3. Index Types: B-Tree, Hash, GIN, and More
  4. When to Add an Index (and When Not To)
  5. Composite Indexes: Column Order Matters
  6. Covering Indexes: The Ultimate Performance Trick
  7. Reading Query Plans with EXPLAIN
  8. Common Indexing Mistakes
  9. Frequently Asked Questions

Key Takeaways

The single most impactful performance optimization in most web applications is not caching, not horizontal scaling, and not query rewriting — it is adding the right database index. I have seen applications go from timing out under load to handling 10x the traffic with a single CREATE INDEX statement. Understanding indexes is not optional knowledge for backend developers — it is foundational.

01

Why Queries Get Slow and How Indexes Fix It

Without an index, finding rows that match a condition requires a full table scan: the database reads every row and checks whether it matches. For a table with 10 million rows, a query like SELECT * FROM orders WHERE customer_id = 12345 reads all 10 million rows, even though only a handful match. As the table grows, the query gets linearly slower.

An index on customer_id lets the database jump directly to the matching rows. The index contains a sorted structure that maps customer_id values to row locations. The database traverses this structure in O(log n) time — about 23 steps for 10 million rows — then jumps directly to the matching rows. The query that took seconds now takes milliseconds.

02

How B-Tree Indexes Work

The B-tree (balanced tree) is the default index type in PostgreSQL, MySQL, SQLite, and most other relational databases. It is a self-balancing tree that keeps all leaf nodes at the same depth, ensuring consistent O(log n) lookup regardless of which value you search for.

Structure of a B-tree index:

This structure explains why B-tree indexes support equality lookups (=), range queries (<, >, BETWEEN), prefix matching (LIKE 'abc%'), and ORDER BY — but not suffix matching (LIKE '%abc'), which would require reading the entire leaf level.

03

Index Types: B-Tree, Hash, GIN, and More

B-Tree (Default)

Best for: equality comparisons, range queries, ordering, prefix searches. The right choice for 90% of indexes.

Hash

Best for: exact equality lookups only (=). Hash indexes can be faster than B-tree for equality-only lookups, but they do not support range queries, ordering, or prefix searches. PostgreSQL hash indexes are crash-safe since PostgreSQL 10. Limited use cases — B-tree is usually the better default.

GIN (Generalized Inverted Index)

Best for: indexing array columns, JSONB columns, and full-text search. A GIN index on a JSONB column lets you efficiently query nested JSON values: WHERE metadata->>'status' = 'active'. Essential for PostgreSQL applications that use JSONB extensively.

GiST (Generalized Search Tree)

Best for: geometric data types, full-text search (alternative to GIN), and network address types. Used for PostGIS geospatial queries.

Partial Indexes

A partial index includes only rows that satisfy a condition. CREATE INDEX ON orders (customer_id) WHERE status = 'pending' creates a smaller, faster index for the common query pattern of finding pending orders. The index is smaller (only pending orders) and the queries that use it are faster.

04

When to Add an Index (and When Not To)

Add an index when:

Do not add an index when:

05

Composite Indexes: Column Order Matters

A composite index on (a, b, c) supports queries that filter on a, on (a, b), or on (a, b, c) — but not queries that only filter on b or c without a. This "left-prefix" rule is the most commonly misunderstood aspect of indexing.

-- Index: CREATE INDEX ON orders (customer_id, status, created_at)

-- USES the index (matches left-prefix)
SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM orders WHERE customer_id = 123 AND status = 'shipped';
SELECT * FROM orders WHERE customer_id = 123 AND status = 'shipped' AND created_at > '2025-01-01';

-- DOES NOT use the index (skips customer_id)
SELECT * FROM orders WHERE status = 'shipped';
SELECT * FROM orders WHERE created_at > '2025-01-01';

Column order in a composite index should reflect the most common query patterns: put the most selective filter first, then the next most selective, and so on. If queries filter equally often by different column combinations, consider separate indexes for each pattern.

06

Covering Indexes: The Ultimate Performance Trick

A covering index includes all columns needed to answer a query — filter conditions and selected columns — so the database never has to touch the main table. This eliminates the random I/O of row lookups entirely.

-- Common query
SELECT email, name FROM users WHERE status = 'active' ORDER BY created_at;

-- Non-covering index (still needs table access)
CREATE INDEX ON users (status);

-- Covering index (satisfies entire query from index)
CREATE INDEX ON users (status, created_at, email, name);
-- PostgreSQL supports INCLUDE columns for this pattern:
CREATE INDEX ON users (status, created_at) INCLUDE (email, name);

PostgreSQL's INCLUDE clause (added in PG 11) allows adding columns to the leaf level of the index without including them in the B-tree structure — they are just along for the ride to support covering queries without affecting the B-tree ordering.

07

Reading Query Plans with EXPLAIN

-- PostgreSQL: see the query plan with actual execution times
EXPLAIN ANALYZE SELECT * FROM orders
WHERE customer_id = 12345 AND status = 'pending';

-- Look for:
-- Seq Scan → full table scan (may need an index)
-- Index Scan → using the index correctly
-- Index Only Scan → covering index (best for reads)
-- Bitmap Index Scan → multi-index scan merged
-- Rows / actual rows mismatch → stale statistics (run ANALYZE)

Key metrics to examine in EXPLAIN ANALYZE: the node type (Seq Scan vs Index Scan), the estimated vs actual row counts (large discrepancies indicate stale statistics), and the total cost. A high-cost Seq Scan on a large table is the clearest signal that an index is needed.

08

Common Indexing Mistakes

Build database skills that scale to production. Hands-on training.

The Precision AI Academy bootcamp covers databases, data engineering, and AI integration. $1,490. October 2026. 40 seats per city.

Reserve Your Seat
Denver New York City Dallas Los Angeles Chicago
09

Frequently Asked Questions

How do database indexes work?

Most relational databases use B-tree indexes. The index stores a sorted copy of the indexed column values with pointers to corresponding rows. To find matching rows, the database traverses the B-tree in O(log n) steps rather than scanning every row in O(n). For 10 million rows, that is 23 steps vs 10 million.

When should you add a database index?

Add an index when a column is frequently used in WHERE clauses, JOIN conditions, or ORDER BY in slow queries; the column has high cardinality; and the table has enough rows that full scans are slow. Do not index every column — indexes slow writes and consume disk space.

What is a covering index?

A covering index includes all columns needed to satisfy a query — filter conditions and selected columns — so the database answers the query entirely from the index without accessing the main table. This eliminates the random I/O cost of row lookups and is the highest-performance optimization for read-heavy query patterns.

How do you identify which queries need indexes?

Use EXPLAIN ANALYZE (PostgreSQL) or EXPLAIN (MySQL) to see the query execution plan. Look for sequential scans on large tables. Enable slow query logging to capture queries exceeding a time threshold. pg_stat_statements in PostgreSQL tracks cumulative query statistics and is invaluable for identifying frequently executed slow queries.

Database performance is a career-defining skill. Master it.

Two days of hands-on training in databases, data engineering, and AI. $1,490. Denver, NYC, Dallas, LA, Chicago. October 2026.

Reserve Your Seat

Note: Index behavior varies between database systems. Examples use PostgreSQL syntax. MySQL, SQLite, and other databases have similar concepts with different syntax details.

Bottom Line
Database indexing explained: how B-tree indexes work, when indexes help and when they hurt, how to diagnose slow queries, composite indexes, covering indexes, and query plan analysis.
BP

Written By

Bo Peng

Kaggle Top 200 · AI Engineer · Founder, Precision AI Academy

Bo builds production AI systems for U.S. federal agencies and teaches the Precision AI Academy bootcamp — a hands-on 2-day intensive in 5 U.S. cities. He writes weekly about what actually works in applied AI.

Kaggle Top 200 Federal AI Practitioner Former Adjunct Professor AI Builder