In This Guide
- Why Relational Databases Still Win
- Schema Design: Tables, Columns, and Data Types
- Normalization: 1NF, 2NF, 3NF Explained Simply
- Relationships: One-to-One, One-to-Many, Many-to-Many
- Indexes: The Key to Query Performance
- The 5 Most Common Database Design Mistakes
- Database Migrations: Changing Schemas Without Breaking Things
- Frequently Asked Questions
Key Takeaways
- Start normalized, denormalize with evidence: Design your schema in 3NF first. Only denormalize when you have evidence of a specific performance problem that normalization causes. Premature denormalization is the most common schema design mistake.
- Choose data types carefully: Use the most specific data type for each column. VARCHAR(255) for everything is lazy and causes real problems — you cannot validate formats, you lose type-based query optimization, and you allow invalid data.
- Index foreign keys and query predicates: Every foreign key column should have an index. Every column that appears in a WHERE or ORDER BY clause in a high-frequency query should have an index. Unindexed foreign keys cause sequential scans on JOIN operations.
- Migrations over manual changes: Never modify a production database schema manually. Use migration tools (Flyway, Liquibase, Prisma Migrate, Alembic) to version and automate schema changes. Migrations are reversible; manual changes are not.
Bad database design is the most common cause of application performance problems that cannot be fixed without a rewrite. The wrong schema creates inefficient queries that no amount of indexing can fully compensate for, data integrity issues that compound over time, and migration costs that grow with every feature addition.
Good database design is a learnable skill with well-established principles. This guide covers the fundamentals: normalization, relationships, indexing, and the specific mistakes that cause the most pain in production systems.
Why Relational Databases Still Win
Relational databases (PostgreSQL, MySQL, SQL Server) remain the right default for most applications in 2026 because they provide ACID guarantees, enforce data integrity through constraints, and support complex queries through a mature, standardized query language (SQL).
ACID properties:
- Atomicity: A transaction either completes fully or not at all. No partial updates.
- Consistency: Transactions bring the database from one valid state to another. Constraints (foreign keys, unique, check) are enforced.
- Isolation: Concurrent transactions behave as if they ran sequentially. No dirty reads or phantom reads at appropriate isolation levels.
- Durability: Committed transactions persist even through system failures. The WAL (write-ahead log) ensures this.
When to consider NoSQL: document stores (MongoDB) for highly variable document structures; key-value stores (Redis) for caching and session storage; column stores (Cassandra, BigQuery) for time-series and analytical workloads; graph databases (Neo4j) for relationship-heavy data like social networks. Use the right tool for the specific workload — but start with PostgreSQL and move to specialized stores only when you have a specific need that PostgreSQL cannot meet efficiently.
Schema Design: Tables, Columns, and Data Types
Design tables to represent entities (users, orders, products) and relationships between them (user_id in orders links to users). Choose the most specific data type for each column.
Data type guidelines:
| Data | Recommended Type | Avoid |
|---|---|---|
| Primary keys | UUID (for distributed systems) or BIGSERIAL | INT if you expect >2B rows |
| Money/currency | NUMERIC(19,4) or store as cents in BIGINT | FLOAT (rounding errors) |
| Timestamps | TIMESTAMPTZ (with timezone) | TIMESTAMP (no timezone, ambiguous) |
| VARCHAR(254) with a CHECK constraint or citext | TEXT (no length limit enforcement) | |
| Status/enum | PostgreSQL ENUM type or VARCHAR with CHECK | INT codes that require a lookup table |
| JSON data | JSONB (binary, indexed) | JSON (text, not indexed) |
Always add created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() and updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() to every table. These columns cost almost nothing and are invaluable for debugging, auditing, and cache invalidation.
Normalization: 1NF, 2NF, 3NF Explained Simply
Normalization is the process of structuring a relational database to reduce data redundancy and improve data integrity. The normal forms (1NF through BCNF) define increasingly strict rules for table structure.
First Normal Form (1NF): Each column contains atomic (indivisible) values. No repeating groups. Each row is uniquely identifiable by a primary key.
Violation example: a users table with a phone_numbers column that stores "555-1234, 555-5678" — two values in one cell. Fix: create a separate user_phone_numbers table with a foreign key to users.
Second Normal Form (2NF): In 1NF, plus every non-key column depends on the entire primary key (not just part of it). Relevant only for tables with composite primary keys.
Third Normal Form (3NF): In 2NF, plus no transitive dependencies. Non-key columns should depend only on the primary key, not on other non-key columns.
Violation example: an orders table that stores both customer_id and customer_email. The customer's email depends on the customer, not on the order. Fix: remove customer_email from orders and join to the customers table when you need it.
Practical guidance: Design to 3NF for all OLTP (transactional) databases. This is the normal form that eliminates the most common redundancy problems. Do not design to BCNF or 4NF by default — the additional complexity rarely provides proportional benefit for application databases.
Relationships: One-to-One, One-to-Many, Many-to-Many
One-to-Many is the most common relationship: one user has many orders, one category has many products, one author has many blog posts. Implement with a foreign key in the "many" table referencing the "one" table's primary key.
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(254) UNIQUE NOT NULL
);
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
total NUMERIC(10,2) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);Many-to-Many requires a junction table. Users can belong to many organizations; organizations have many users. Create a memberships table with foreign keys to both users and organizations. The junction table can also carry relationship-specific data (the member's role, when they joined).
One-to-One is rare but sometimes appropriate for separating frequently-accessed core data from infrequently-accessed extended data. Store core user data in users and extended profile data in user_profiles with a user_id UNIQUE foreign key. This can improve query performance if the profile table is large and rarely needed.
Indexes: The Key to Query Performance
An index is a data structure that allows the database to find rows matching a query condition without scanning the entire table. Without indexes, every query is a full table scan — reading every row. With indexes, the database can find matching rows in O(log n) time.
When to add an index:
- Every primary key (automatic in PostgreSQL)
- Every foreign key column (not automatic in PostgreSQL — add explicitly)
- Columns frequently used in WHERE clauses with high selectivity (many distinct values)
- Columns used in ORDER BY for queries that must be sorted
- Composite indexes for queries that filter on multiple columns together
-- Index the foreign key CREATE INDEX idx_orders_user_id ON orders(user_id); -- Composite index for common query pattern CREATE INDEX idx_orders_user_status ON orders(user_id, status) WHERE status != 'completed';
Index trade-offs: indexes speed up reads but slow down writes (every INSERT, UPDATE, DELETE must update all relevant indexes). A table with 20 indexes will have 20x slower writes compared to the same table with 1 index. Add indexes based on actual slow queries, not speculatively.
Use EXPLAIN ANALYZE in PostgreSQL to see the query plan and identify sequential scans on large tables that should use an index. The pg_stat_statements extension tracks query performance metrics to identify the slowest queries in production.
The 5 Most Common Database Design Mistakes
- Using VARCHAR(255) for everything. Money as VARCHAR, status codes as VARCHAR, IDs as VARCHAR. Use the appropriate type for each column. The database enforces type constraints at the storage level — let it.
- No foreign key constraints. Foreign keys enforce referential integrity at the database level. Without them, orphaned records accumulate — orders referencing deleted users, products in nonexistent categories. Add foreign keys to every relationship.
- Soft-deleting without proper indexes. Using an
is_deletedflag is a common pattern, but every query must addWHERE is_deleted = false. This means theis_deletedcolumn must be part of every composite index used by active-record queries. Miss one, and you will get deleted records leaking into results. - Storing computed values instead of computing them. Storing a user's age instead of their birth date, storing an order total instead of computing it from line items. When the source data changes, the computed value becomes stale. Store source data and compute derived values at query time.
- Not using database transactions for multi-step operations. If your "create order" process inserts to orders, decrements inventory, and charges the customer, all three must succeed or all three must fail. Use a transaction — no partial state should be possible.
Database Migrations: Changing Schemas Without Breaking Things
A migration is a versioned script that transforms the database schema from one state to another. Use migration tools (Prisma Migrate, Alembic, Flyway, Liquibase, Rails migrations) to manage all schema changes — never run ALTER TABLE commands manually in production.
Safe migration patterns for zero-downtime deployments:
- Adding a nullable column: Safe. Add the column with DEFAULT or allow NULL. Deploy. Backfill data. Make NOT NULL in a later migration after all rows have values.
- Renaming a column: Dangerous. Add the new column, deploy code that writes to both old and new columns, backfill old column values to new column, deploy code that reads only from new column, drop old column. Four separate migrations and deployments.
- Dropping a column: Always safe to do after code that reads the column has been removed and deployed. Never drop a column in the same migration as removing the code that uses it.
- Adding an index: Use
CREATE INDEX CONCURRENTLYin PostgreSQL to add an index without locking the table. RegularCREATE INDEXtakes a table-level lock that blocks writes during index creation on large tables.
Frequently Asked Questions
When should I use NoSQL instead of PostgreSQL?
Use NoSQL for specific workloads where it genuinely excels: Redis for caching and session storage, MongoDB for highly variable document structures where schema flexibility is more valuable than consistency guarantees, Cassandra for time-series data with massive write volume, DynamoDB for key-value access patterns at extreme scale with single-digit millisecond latency requirements. For most application data with relationships and ACID requirements, PostgreSQL is the right choice.
What is the difference between a primary key and a unique constraint?
A primary key is a combination of NOT NULL and UNIQUE. It uniquely identifies each row and cannot be null. A unique constraint allows the column to be null (and in PostgreSQL, multiple null values are permitted in a unique column since null is not equal to null) but requires non-null values to be unique. Every table should have a primary key. Use unique constraints for columns that must be unique but are not the primary identifier (like email in a users table where id is the primary key).
Should I use UUIDs or integers as primary keys?
Use UUIDs (v4) if: you generate IDs in the application layer before inserting to the database, you need to merge data from multiple databases, or you do not want sequential IDs exposed to users (security through obscurity for resource URLs). Use BIGSERIAL integers if: you want smaller storage footprint, simpler debugging, and slightly faster index lookups. UUID v7 (time-ordered) is a good compromise — it is a UUID (non-sequential-looking from the outside) but stored in time order, which avoids the write amplification of random UUID inserts in B-tree indexes.
What is an ORM and should I use one?
An ORM (Object-Relational Mapper) maps database tables to programming language objects and generates SQL queries from method calls. Prisma, Drizzle (TypeScript), SQLAlchemy (Python), and ActiveRecord (Ruby) are popular ORMs. Use an ORM for standard CRUD operations and schema management. Write raw SQL for complex queries, aggregations, and queries where the ORM generates inefficient SQL. The 80/20 rule: ORM for 80% of queries, raw SQL for the 20% where performance or complexity demands it.
Great database design is the foundation of every scalable application. Get the skills.
Join professionals from Denver, NYC, Dallas, LA, and Chicago for two days of hands-on AI and tech training. $1,490. October 2026. Seats are limited.
Reserve Your SeatNote: Information in this article reflects the state of the field as of early 2026. Technology evolves rapidly — verify specific details directly with vendors before making decisions.