Day 5 of 5
⏱ ~55 minutes
PostgreSQL for Developers — Day 5

Production PostgreSQL: Migrations, Pooling, and Backups

Running queries in development is one skill. Running a database in production that stays up, stays fast, and doesn't lose data is another. This lesson covers the tools and practices that make the difference.

Schema Migrations With Alembic

Never edit a production database manually. Use migrations — versioned scripts that apply changes in a controlled, repeatable way. Alembic is the standard migration tool for SQLAlchemy/Python projects.

bash
pip install alembic sqlalchemy
alembic init migrations
python — alembic/env.py
from sqlalchemy import engine_from_config
from alembic import context

# Point Alembic at your database
config = context.config
config.set_main_option("sqlalchemy.url", "postgresql://localhost/mydb")
bash
# Generate a new migration
alembic revision --autogenerate -m "add embedding column to documents"

# Apply pending migrations
alembic upgrade head

# Roll back the last migration
alembic downgrade -1

# See current migration state
alembic current
python — example migration
def upgrade():
    op.add_column('documents',
        sa.Column('embedding', Vector(1536), nullable=True))
    op.create_index('idx_documents_embedding',
        'documents', ['embedding'],
        postgresql_using='hnsw',
        postgresql_with={'m': 16, 'ef_construction': 64})

def downgrade():
    op.drop_index('idx_documents_embedding', 'documents')
    op.drop_column('documents', 'embedding')

Connection Pooling

Each database connection consumes memory (~5-10 MB). A web app that opens a new connection per request will exhaust PostgreSQL's connection limit fast. Use connection pooling to reuse connections.

python — SQLAlchemy pool
from sqlalchemy import create_engine

engine = create_engine(
    "postgresql://user:pass@localhost/mydb",
    pool_size=10,        # connections to keep open
    max_overflow=20,     # extra connections allowed during spikes
    pool_timeout=30,     # seconds to wait for a connection
    pool_recycle=3600,   # recycle connections after 1 hour
)
bash — PgBouncer (external pooler for high traffic)
# Install
sudo apt-get install pgbouncer

# /etc/pgbouncer/pgbouncer.ini
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
listen_port = 6432
pool_mode = transaction  # most efficient
max_client_conn = 1000
default_pool_size = 20

Backups

bash
# Logical backup — SQL dump
pg_dump -U postgres mydb > backup_$(date +%Y%m%d).sql

# Restore from dump
psql -U postgres mydb < backup_20240101.sql

# Compressed backup (much smaller for large databases)
pg_dump -U postgres -Fc mydb > backup.dump
pg_restore -U postgres -d mydb backup.dump

# Automate with cron (daily at 2am)
0 2 * * * pg_dump -U postgres mydb | gzip > /backups/mydb_$(date +\%Y\%m\%d).sql.gz
⚠️
Test your backups. A backup you've never restored is a backup you've never tested. Set up a test restore process and run it periodically. Many incidents involve discovering the backup was corrupt at the worst possible time.

Monitoring — Find Slow Queries

sql
-- Enable pg_stat_statements (add to postgresql.conf)
-- shared_preload_libraries = 'pg_stat_statements'
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find the slowest queries
SELECT
    query,
    calls,
    total_exec_time / calls  AS avg_ms,
    rows / calls             AS avg_rows
FROM pg_stat_statements
ORDER BY avg_ms DESC
LIMIT 10;

-- Find currently running queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY duration DESC;

-- Kill a stuck query
SELECT pg_terminate_backend(pid) FROM pg_stat_activity
WHERE pid = 12345;

VACUUM and AUTOVACUUM

PostgreSQL doesn't immediately delete rows on DELETE or UPDATE. It marks them as dead. VACUUM reclaims that space. PostgreSQL runs autovacuum automatically, but large tables sometimes need a manual nudge.

sql
-- Check table bloat
SELECT
    schemaname,
    relname           AS table_name,
    pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
    n_dead_tup        AS dead_rows,
    last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

-- Manual vacuum on a specific table
VACUUM ANALYZE documents;

Deploying to Supabase or Railway

bash
# Supabase — managed PostgreSQL with pgvector built in
# 1. Create project at supabase.com
# 2. Get the connection string from Settings > Database
# 3. Run migrations against the Supabase URL

DATABASE_URL="postgresql://postgres:[password]@db.[ref].supabase.co:5432/postgres"
alembic upgrade head

# Railway — another managed option
railway up
📝 Exercise
Deploy a Production-Ready Database
  1. Set up Alembic for your project. Create an initial migration that creates your tables.
  2. Write a second migration that adds a new column to an existing table.
  3. Apply both migrations with alembic upgrade head.
  4. Configure SQLAlchemy with pool_size=5 and pool_recycle=3600.
  5. Create a backup script that dumps your database and timestamps the filename.
  6. Enable pg_stat_statements and identify the slowest query in your app.

Lesson Summary

  • Use Alembic migrations — never edit production schemas manually. Every schema change is a versioned script.
  • SQLAlchemy's built-in connection pool handles most cases. Use PgBouncer for high-concurrency production traffic.
  • Automate backups with pg_dump + cron. Test restores regularly.
  • pg_stat_statements and EXPLAIN ANALYZE are your tools for finding and fixing slow queries.

Go Deeper in the Live Bootcamp

In the 3-day bootcamp you'll design a production database schema, write migrations, connect it to a Next.js app, and deploy the whole stack. Denver, LA, NYC, Chicago, Dallas — October 2026.

Reserve Your Seat — $1,490
Challenge

Set up Supabase (free tier). Create your schema via migrations, enable pgvector, store 20 documents with embeddings, and write a Python script that does semantic search against the live Supabase database.

Finished this lesson?