Database Indexing Guide 2026: Speed Up Queries 100x

Q: How do database indexes work?

Most relational database indexes use a B-tree (balanced tree) data structure. The index stores a sorted copy of the indexed column values alongside pointers to the corresponding table rows. To find rows matching a condition, the database traverses the B-tree from root to leaf — an O(log n) operation — rather than scanning every row in the table — O(n). For a table with 10 million rows, an index reduces a sequential scan from 10 million comparisons to about 23 (log2 of 10 million).

Q: When should you add a database index?

Add an index on a column when: it is frequently used in WHERE clauses, JOIN conditions, or ORDER BY/GROUP BY in slow queries; the column has high cardinality (many distinct values — not useful to index a boolean column); and the table has enough rows that a full scan is noticeably slow (typically thousands of rows or more). Do not add an index on every column — indexes slow down INSERT, UPDATE, and DELETE operations and consume disk space.

Q: How do you identify which queries need indexes?

The primary tool is EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL) — it shows the query execution plan, including whether the database uses an index or performs a sequential scan, the estimated row counts, and the actual execution time. Look for sequential scans on large tables, high row estimates that do not match actual results, and nodes that show high cost. Most databases also have slow query logs — enable them to capture queries exceeding a time threshold.

In This Guide

Why Queries Get Slow and How Indexes Fix It
How B-Tree Indexes Work
Index Types: B-Tree, Hash, GIN, and More
When to Add an Index (and When Not To)
Composite Indexes: Column Order Matters
Covering Indexes: The Ultimate Performance Trick
Reading Query Plans with EXPLAIN
Common Indexing Mistakes
Frequently Asked Questions

Key Takeaways

A well-placed index turns a sequential scan into a B-tree traversal — the difference between scanning 10 million rows and checking 23 nodes. This is where "100x faster" comes from.
Indexes cost write performance. Every INSERT, UPDATE, and DELETE must update all indexes on the table. Index everything you need; index nothing you do not.
Column order in composite indexes is not arbitrary. The index is only useful for queries that filter on a left-prefix of the indexed columns. Understanding this prevents hours of debugging mysterious slow queries.
Use EXPLAIN ANALYZE before and after adding an index. Do not guess — measure. The query plan tells you exactly what the optimizer is doing.

The single most impactful performance optimization in most web applications is not caching, not horizontal scaling, and not query rewriting — it is adding the right database index. I have seen applications go from timing out under load to handling 10x the traffic with a single CREATE INDEX statement. Understanding indexes is not optional knowledge for backend developers — it is foundational.

Why Queries Get Slow and How Indexes Fix It

Without an index, finding rows that match a condition requires a full table scan: the database reads every row and checks whether it matches. For a table with 10 million rows, a query like SELECT * FROM orders WHERE customer_id = 12345 reads all 10 million rows, even though only a handful match. As the table grows, the query gets linearly slower.

An index on customer_id lets the database jump directly to the matching rows. The index contains a sorted structure that maps customer_id values to row locations. The database traverses this structure in O(log n) time — about 23 steps for 10 million rows — then jumps directly to the matching rows. The query that took seconds now takes milliseconds.

How B-Tree Indexes Work

The B-tree (balanced tree) is the default index type in PostgreSQL, MySQL, SQLite, and most other relational databases. It is a self-balancing tree that keeps all leaf nodes at the same depth, ensuring consistent O(log n) lookup regardless of which value you search for.

Structure of a B-tree index:

Root node: The top of the tree. Contains key values that divide the key space into ranges, and pointers to child nodes.
Internal nodes: Intermediate levels between root and leaves. Each contains key values and pointers to child nodes.
Leaf nodes: The bottom level. Contains the actual indexed values and pointers (row IDs / heap tuple IDs) to the corresponding rows in the table.
Linked leaf nodes: Leaf nodes are linked in order, enabling efficient range scans: WHERE price BETWEEN 100 AND 500 traverses the tree to find 100, then follows the leaf node links to 500.

This structure explains why B-tree indexes support equality lookups (=), range queries (<, >, BETWEEN), prefix matching (LIKE 'abc%'), and ORDER BY — but not suffix matching (LIKE '%abc'), which would require reading the entire leaf level.

Index Types: B-Tree, Hash, GIN, and More

B-Tree (Default)

Best for: equality comparisons, range queries, ordering, prefix searches. The right choice for 90% of indexes.

Hash

Best for: exact equality lookups only (=). Hash indexes can be faster than B-tree for equality-only lookups, but they do not support range queries, ordering, or prefix searches. PostgreSQL hash indexes are crash-safe since PostgreSQL 10. Limited use cases — B-tree is usually the better default.

GIN (Generalized Inverted Index)

Best for: indexing array columns, JSONB columns, and full-text search. A GIN index on a JSONB column lets you efficiently query nested JSON values: WHERE metadata->>'status' = 'active'. Essential for PostgreSQL applications that use JSONB extensively.

GiST (Generalized Search Tree)

Best for: geometric data types, full-text search (alternative to GIN), and network address types. Used for PostGIS geospatial queries.

Partial Indexes

A partial index includes only rows that satisfy a condition. CREATE INDEX ON orders (customer_id) WHERE status = 'pending' creates a smaller, faster index for the common query pattern of finding pending orders. The index is smaller (only pending orders) and the queries that use it are faster.

When to Add an Index (and When Not To)

Add an index when:

The column appears frequently in WHERE clauses, JOIN conditions, or ORDER BY on queries that are slow
The column has high cardinality (many distinct values). An index on a boolean column with only two values is rarely useful.
The table has enough rows that a full scan is noticeably slow (typically 10,000+ rows, depending on the query)
EXPLAIN shows a sequential scan (Seq Scan in PostgreSQL) on a large table

Do not add an index when:

The table is small — full scans on small tables are fast, and index overhead is not worth it
The column has low cardinality (status columns with 3–5 values, boolean columns) — the optimizer may choose a full scan anyway
The table is write-heavy — each index adds overhead to every write operation
The index would never be used based on actual query patterns

Composite Indexes: Column Order Matters

A composite index on (a, b, c) supports queries that filter on a, on (a, b), or on (a, b, c) — but not queries that only filter on b or c without a. This "left-prefix" rule is the most commonly misunderstood aspect of indexing.

-- Index: CREATE INDEX ON orders (customer_id, status, created_at)

-- USES the index (matches left-prefix)
SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM orders WHERE customer_id = 123 AND status = 'shipped';
SELECT * FROM orders WHERE customer_id = 123 AND status = 'shipped' AND created_at > '2025-01-01';

-- DOES NOT use the index (skips customer_id)
SELECT * FROM orders WHERE status = 'shipped';
SELECT * FROM orders WHERE created_at > '2025-01-01';

Column order in a composite index should reflect the most common query patterns: put the most selective filter first, then the next most selective, and so on. If queries filter equally often by different column combinations, consider separate indexes for each pattern.

Covering Indexes: The Ultimate Performance Trick

A covering index includes all columns needed to answer a query — filter conditions and selected columns — so the database never has to touch the main table. This eliminates the random I/O of row lookups entirely.

-- Common query
SELECT email, name FROM users WHERE status = 'active' ORDER BY created_at;

-- Non-covering index (still needs table access)
CREATE INDEX ON users (status);

-- Covering index (satisfies entire query from index)
CREATE INDEX ON users (status, created_at, email, name);
-- PostgreSQL supports INCLUDE columns for this pattern:
CREATE INDEX ON users (status, created_at) INCLUDE (email, name);

PostgreSQL's INCLUDE clause (added in PG 11) allows adding columns to the leaf level of the index without including them in the B-tree structure — they are just along for the ride to support covering queries without affecting the B-tree ordering.

Reading Query Plans with EXPLAIN

-- PostgreSQL: see the query plan with actual execution times
EXPLAIN ANALYZE SELECT * FROM orders
WHERE customer_id = 12345 AND status = 'pending';

-- Look for:
-- Seq Scan → full table scan (may need an index)
-- Index Scan → using the index correctly
-- Index Only Scan → covering index (best for reads)
-- Bitmap Index Scan → multi-index scan merged
-- Rows / actual rows mismatch → stale statistics (run ANALYZE)

Key metrics to examine in EXPLAIN ANALYZE: the node type (Seq Scan vs Index Scan), the estimated vs actual row counts (large discrepancies indicate stale statistics), and the total cost. A high-cost Seq Scan on a large table is the clearest signal that an index is needed.

Common Indexing Mistakes

Indexing every column: Indexes slow down writes and consume disk space. Only index what queries actually need.
Wrong column order in composite indexes: The left-prefix rule means a (b, a) index does not help a query that only filters on a.
Not running EXPLAIN: Adding an index without checking whether the optimizer actually uses it. The optimizer may choose a sequential scan even with an index if the table is small or the predicate is not selective.
Implicit type conversion prevents index use: WHERE id = '123' where id is an integer may trigger a sequential scan because the type conversion prevents index matching.
Functions on indexed columns: WHERE LOWER(email) = '[email protected]' does not use an index on email — it uses a full scan. Use a functional index: CREATE INDEX ON users (LOWER(email)).

Build database skills that scale to production. Hands-on training.

The Precision AI Academy bootcamp covers databases, data engineering, and AI integration. $1,490. June–October 2026 (Thu–Fri). 40 seats per city.

Reserve Your Seat

Denver New York City Dallas Los Angeles Chicago

Frequently Asked Questions

How do database indexes work?

Most relational databases use B-tree indexes. The index stores a sorted copy of the indexed column values with pointers to corresponding rows. To find matching rows, the database traverses the B-tree in O(log n) steps rather than scanning every row in O(n). For 10 million rows, that is 23 steps vs 10 million.

When should you add a database index?

Add an index when a column is frequently used in WHERE clauses, JOIN conditions, or ORDER BY in slow queries; the column has high cardinality; and the table has enough rows that full scans are slow. Do not index every column — indexes slow writes and consume disk space.

What is a covering index?

A covering index includes all columns needed to satisfy a query — filter conditions and selected columns — so the database answers the query entirely from the index without accessing the main table. This eliminates the random I/O cost of row lookups and is the highest-performance optimization for read-heavy query patterns.

How do you identify which queries need indexes?

Use EXPLAIN ANALYZE (PostgreSQL) or EXPLAIN (MySQL) to see the query execution plan. Look for sequential scans on large tables. Enable slow query logging to capture queries exceeding a time threshold. pg_stat_statements in PostgreSQL tracks cumulative query statistics and is invaluable for identifying frequently executed slow queries.

Database performance is a career-defining skill. Master it.

Two days of hands-on training in databases, data engineering, and AI. $1,490. Denver, NYC, Dallas, LA, Chicago. June–October 2026 (Thu–Fri).

Reserve Your Seat

Note: Index behavior varies between database systems. Examples use PostgreSQL syntax. MySQL, SQLite, and other databases have similar concepts with different syntax details.

Bottom Line

Database indexing explained: how B-tree indexes work, when indexes help and when they hurt, how to diagnose slow queries, composite indexes, covering indexes, and query plan analysis.

Our Take

Most index problems are actually schema design problems in disguise.

When a query needs an index to be tolerable, that is often a sign the data model needs a second look — not that you should keep bolting on indexes. We see this constantly: a table with 15 indexes that nobody audited in two years, half of them unused, all of them slowing down the write path. The "just add an index" reflex is fast and feels productive, but it compounds technical debt in exactly the way that is hardest to unwind later. The right habit is to reach for EXPLAIN first, question whether the query pattern is right, and only then reach for CREATE INDEX.

The covering index — specifically PostgreSQL's INCLUDE clause — is the most underused optimization we know of. A non-covering index on a high-traffic query forces a heap fetch for every matched row; on spinning disks or heavily-loaded SSDs that random I/O is a wall. Adding two or three INCLUDE columns to an existing index can eliminate thousands of heap fetches per second without adding index maintenance cost to the B-tree structure itself. It is a genuinely asymmetric optimization: almost free on writes, significant on reads. Most teams discover it only after a painful production incident.

For anyone learning database engineering right now, the practical takeaway is this: understanding EXPLAIN ANALYZE output — not just running it, but reading the actual vs estimated rows, identifying nested loop vs hash join vs merge join, and spotting stale statistics — is a skill that will make you the most valuable person in any backend engineering room. Courses skip it because it is not glamorous. That is precisely why it is worth learning.

Published By

Precision AI Academy

Practitioner-focused AI education · 2-day in-person bootcamp in 5 U.S. cities

Precision AI Academy publishes deep-dives on applied AI engineering for working professionals. Founded by Bo Peng (Kaggle Top 200) who leads the in-person bootcamp in Denver, NYC, Dallas, LA, and Chicago.

Kaggle Top 200 Federal AI Practitioner 5 U.S. Cities Thu–Fri Cohorts