Data Modeling Guide [2026]: ER Diagrams to Production Schema

Key Takeaways

  • Start with conceptual modeling (entities and relationships), then translate to logical, then physical schema
  • Normalize to 3NF for OLTP systems — redundancy is the enemy of data integrity
  • Every many-to-many relationship needs a junction table in the physical schema
  • Index foreign keys and query predicates — poor indexing is the most common production performance problem
  • OLAP (analytics) schemas intentionally denormalize into star or snowflake schemas for read speed

Start with Entities and Relationships — Not Tables

Good data modeling starts conceptually, before any SQL. Entity-Relationship (ER) modeling captures the business domain in terms of entities (things) and relationships (how things connect).

An entity is a noun in the business domain: Customer, Order, Product, Employee, Account. Entities become tables. An attribute is a property of an entity: customer name, order date, product price. Attributes become columns.

The three levels of data modeling:

The Three Relationship Types

One-to-One (1:1) — One entity on each side. Rare and often signals the entities should merge. Example: a User and their Profile (if every user has exactly one profile). Implement with a foreign key + unique constraint on the child table.

One-to-Many (1:N) — One entity relates to many on the other side. The most common relationship. Example: a Customer has many Orders. The "many" side gets the foreign key: orders.customer_id REFERENCES customers(id).

Many-to-Many (M:N) — Many entities on both sides. Example: Students and Courses (a student takes many courses; a course has many students). Cannot be represented directly — requires a junction table:

-- Junction table for M:N student-course relationship
CREATE TABLE enrollments (
    student_id  BIGINT NOT NULL REFERENCES students(id),
    course_id   BIGINT NOT NULL REFERENCES courses(id),
    enrolled_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    grade       CHAR(2),
    PRIMARY KEY (student_id, course_id)
);

CREATE INDEX idx_enrollments_course ON enrollments(course_id);

Normalization: Eliminating Redundancy

Normalization is the formal process of structuring a schema to reduce redundancy. Each normal form fixes a specific class of problem:

Normal FormRuleProblem Fixed
1NFAtomic values, no repeating groupsArrays/lists in a column (store tags as separate rows, not CSV)
2NFNo partial dependencies (every non-key attr depends on the full PK)Composite key tables where some columns depend on only part of the key
3NFNo transitive dependencies (non-key attrs don't depend on other non-key attrs)Storing both zip_code and city — city depends on zip, not directly on the PK
BCNFEvery functional dependency has a superkey on the leftEdge cases in 3NF with multiple overlapping candidate keys

For most production OLTP systems, 3NF is the right target. Don't go to 4NF/5NF unless you have a specific multi-valued dependency problem.

Translating an ER Diagram to SQL Schema

A real e-commerce example — customers, orders, products, and order line items:

CREATE TABLE customers (
    id         BIGSERIAL PRIMARY KEY,
    email      TEXT NOT NULL UNIQUE,
    name       TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE products (
    id          BIGSERIAL PRIMARY KEY,
    sku         TEXT NOT NULL UNIQUE,
    name        TEXT NOT NULL,
    price_cents INTEGER NOT NULL CHECK (price_cents >= 0),
    stock_qty   INTEGER NOT NULL DEFAULT 0
);

CREATE TABLE orders (
    id          BIGSERIAL PRIMARY KEY,
    customer_id BIGINT NOT NULL REFERENCES customers(id),
    status      TEXT NOT NULL DEFAULT 'pending',
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE order_items (
    id         BIGSERIAL PRIMARY KEY,
    order_id   BIGINT NOT NULL REFERENCES orders(id),
    product_id BIGINT NOT NULL REFERENCES products(id),
    quantity   INTEGER NOT NULL CHECK (quantity > 0),
    unit_price INTEGER NOT NULL  -- snapshot price at order time
);

-- Indexes on foreign keys (not created automatically in PostgreSQL)
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_order_items_order ON order_items(order_id);
CREATE INDEX idx_order_items_product ON order_items(product_id);

Index Strategy: The Most Impactful Decision After Schema

After getting the schema right, index strategy is the biggest lever on query performance. Rules of thumb:

Common Schema Patterns

Soft deletes — Instead of DELETE, set a deleted_at TIMESTAMPTZ column. Add partial index: CREATE INDEX idx_users_active ON users(email) WHERE deleted_at IS NULL — the index only covers non-deleted rows.

Audit log — Create an audit_log table with table_name, record_id, action, old_values (JSONB), new_values (JSONB), changed_by, changed_at. Write to it via triggers.

Polymorphic associations — A comment that can belong to a Post or a Video. Options: separate tables per type (cleanest), abstract base table, or JSONB. The separate tables approach is usually best for schema clarity and referential integrity.

Hierarchical data — Categories with subcategories. Options: adjacency list (parent_id FK), nested sets, closure table, or PostgreSQL LTREE extension. Closure table is the most flexible for arbitrary-depth trees with efficient queries.

OLTP vs OLAP Schema Design

AspectOLTP (Normalized)OLAP (Denormalized)
GoalFast inserts, updates, deletesFast aggregation queries
Structure3NF normalized tablesStar/snowflake schema
JoinsMany joins — that's fineMinimize joins — pre-join into wide tables
RedundancyEliminatedIntentional (copies of data for read speed)
Typical databasesPostgreSQL, MySQLSnowflake, BigQuery, Redshift, ClickHouse

Learn Data Modeling and SQL at Precision AI Academy

Our bootcamp teaches database design, schema optimization, and modern data engineering — from ER diagrams to production systems that scale. Five cities, October 2026.

$1,490 · October 2026 · Denver, LA, NYC, Chicago, Dallas
Reserve Your Seat

Frequently Asked Questions

What is normalization and how far should I normalize?

Normalization eliminates redundancy by applying progressive rules (1NF → 2NF → 3NF). For most OLTP systems, 3NF is the target. Analytics schemas intentionally denormalize for read performance. Normalize first, then denormalize only where profiling shows a real problem.

When should I use a many-to-many relationship?

When entities on both sides can relate to multiple entities on the other side. Always implement with a junction table. The junction table often carries relationship attributes (enrollment date, role, etc.) making it a full entity in its own right.

Should I use UUIDs or auto-increment integers as primary keys?

Auto-increment integers are simpler and faster for single-database systems. UUIDs work better in distributed systems. Use UUID v7 (time-ordered) if you go the UUID route — it avoids B-tree fragmentation from random inserts.

BP
Bo Peng

Founder of Precision AI Academy. Software engineer with production database design experience. Teaches SQL, data modeling, and backend engineering to working professionals.