Key Takeaways
- Databases store data in pages (typically 8KB blocks) — understanding this explains index and query behavior
- B-tree indexes reduce query time from O(n) to O(log n) — the difference between 10 million lookups and 24
- ACID transactions guarantee atomicity, consistency, isolation, and durability
- MVCC (Multi-Version Concurrency Control) enables reads without blocking writes — key to PostgreSQL's performance
- Write-Ahead Logging (WAL) is how databases survive crashes without losing committed data
How Databases Store Data: Pages and Storage Engines
Every database reads and writes data in fixed-size blocks called pages — typically 8KB in PostgreSQL, 16KB in MySQL. Understanding this is the foundation of understanding everything else about databases.
When you run SELECT * FROM users WHERE id = 5, the database doesn't read individual rows — it reads whole pages from disk into a memory buffer pool (the buffer cache), then finds your row within the page. This is why returning one row from a million-row table with a good index isn't much slower than returning one row from a 100-row table — the work is in the page reads, and the index tells you exactly which page to read.
Storage engines handle the physical layer:
- InnoDB (MySQL default) — Row-level locking, ACID transactions, MVCC, clustered primary key index
- PostgreSQL heap storage — Stores rows in heap files, separate B-tree indexes, MVCC via row versioning
- RocksDB (Facebook) — LSM-tree based, optimized for write-heavy workloads, used in MySQL/MariaDB, Cassandra, MongoDB
- WiredTiger (MongoDB default) — B-tree + LSM hybrid, document storage
Indexes and B-Trees: The Core of Query Performance
A B-tree (Balanced Tree) index is the data structure behind virtually every database index. It's a self-balancing tree where every path from root to leaf has the same length, ensuring O(log n) search time regardless of which value you look for.
B-tree structure for a simple index on user_id:
- Root node holds a few key values that partition the data
- Internal nodes guide the search down the tree
- Leaf nodes contain the actual key values and row pointers (or, for clustered indexes, the row data itself)
For a table with 10 million rows, a B-tree of order 100 has height about 3-4. That's 3-4 page reads to find any row — compared to scanning all 10 million rows without an index.
Index types in PostgreSQL:
| Index Type | Structure | Best For |
|---|---|---|
| B-tree (default) | Balanced tree | Equality, range queries, sorting |
| Hash | Hash table | Equality lookups only, faster than B-tree for exact match |
| GiST | Generalized search tree | Geometric data, full-text search, range types |
| GIN | Generalized inverted index | Full-text search, JSONB, array columns |
| BRIN | Block range index | Very large tables with natural ordering (timestamps) |
Query Execution: From SQL to Data
When you submit a SQL query, the database goes through several stages:
- Parsing — SQL text is parsed into an abstract syntax tree (AST)
- Rewriting — Views are expanded, rules are applied
- Planning/Optimization — The query planner generates multiple execution plans and estimates their cost using table statistics. Picks the cheapest plan.
- Execution — The chosen plan is executed. Each node in the plan tree (Seq Scan, Index Scan, Hash Join, etc.) processes data and passes results up.
Always use EXPLAIN ANALYZE to understand what your database is actually doing:
EXPLAIN ANALYZE SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2026-01-01'
GROUP BY u.id, u.name;
-- Output shows:
-- Hash Aggregate (actual rows, actual time)
-- -> Hash Left Join (hash cond: o.user_id = u.id)
-- -> Index Scan on users (using idx_users_created_at)
-- -> Seq Scan on orders
Key things to look for: Seq Scan on large tables (needs an index), high row estimates vs actual (stale statistics — run ANALYZE), expensive Sort operations (add index for the ORDER BY column).
Transactions and ACID
A transaction is a unit of work that either completes entirely or not at all. The ACID properties guarantee reliability:
- Atomicity — All or nothing. If a bank transfer deducts $100 from one account but fails before crediting the other, the deduction is rolled back.
- Consistency — Transactions take the database from one valid state to another. Foreign key constraints, NOT NULL, CHECK constraints are enforced within transactions.
- Isolation — Concurrent transactions don't see each other's uncommitted changes. Different isolation levels offer different tradeoffs between correctness and performance.
- Durability — Once committed, data survives crashes. Guaranteed by the write-ahead log (WAL).
Isolation levels (weakest to strongest):
| Level | Dirty Read | Non-Repeatable Read | Phantom Read | Use When |
|---|---|---|---|---|
| Read Uncommitted | Possible | Possible | Possible | Rarely — analytics only |
| Read Committed (default) | No | Possible | Possible | Most OLTP workloads |
| Repeatable Read | No | No | Possible | When consistent reads matter |
| Serializable | No | No | No | Financial, auditing systems |
MVCC: How PostgreSQL Handles Concurrent Access
Multi-Version Concurrency Control (MVCC) is the technique PostgreSQL uses to allow reads and writes to happen concurrently without blocking each other. Instead of locking rows for reads, the database keeps multiple versions of each row.
Each row version has hidden system columns: xmin (the transaction ID that created it) and xmax (the transaction ID that deleted/updated it, or 0 if current). When you read, you see only row versions that were committed before your transaction started — a consistent snapshot in time.
The tradeoff: old row versions accumulate and must be cleaned up by the VACUUM process. AUTOVACUUM handles this automatically, but understanding it matters for high-write systems.
Write-Ahead Logging: Durability Without Sacrifice
The Write-Ahead Log (WAL) is how databases guarantee durability. Before any change is written to the actual data files, the change is written to the WAL — a sequential append-only log on disk. Sequential writes are fast; random writes are slow. WAL gets durability cheaply.
On crash recovery, the database replays WAL entries to reconstruct any changes that made it to the log but not to the data files. This is why committed transactions survive crashes.
WAL also enables streaming replication — the primary server sends WAL records to replica servers in real time. Replicas apply the WAL and stay in sync. This is how PostgreSQL, MySQL, and most databases do high availability.
SQL vs NoSQL: Choosing the Right Tool
| Need | Reach For | Why |
|---|---|---|
| Complex queries, joins, reporting | PostgreSQL, MySQL | SQL is purpose-built for relational queries |
| Flexible document storage | MongoDB, DynamoDB | Schema-less, good for varying document shapes |
| Key-value cache | Redis, Memcached | In-memory, sub-millisecond reads |
| Time-series data | InfluxDB, TimescaleDB | Optimized ingestion and range queries over time |
| Graph relationships | Neo4j, Amazon Neptune | Traversals that would be JOIN hell in SQL |
| Write-heavy scale | Cassandra, DynamoDB | LSM trees optimized for high-volume writes |
Master Databases at Precision AI Academy
Our bootcamp covers SQL, query optimization, database design, and modern data stacks — skills every backend developer and data engineer needs. Five cities, October 2026.
Frequently Asked Questions
How do database indexes actually work?
B-tree indexes store sorted key values with row pointers. Searching is O(log n) vs O(n) full table scan. A 10M-row table takes ~24 B-tree lookups vs 10M row reads without an index. Indexes speed up reads but slow down writes.
What is ACID and why does it matter?
ACID guarantees that database transactions are processed reliably — all or nothing (atomicity), consistent state (consistency), concurrent isolation, and crash durability. Critical for financial, medical, and any mission-critical system.
What is the difference between a clustered and non-clustered index?
A clustered index stores the actual row data in the B-tree leaf nodes — one per table. Non-clustered indexes store key values plus pointers, requiring a second lookup to fetch the row. Covering indexes include all needed columns to avoid that second lookup.