Day 3 of 5
⏱ ~55 minutes
Build a Data Dashboard — Day 3

SQL Connections: Query a Real Database From Your Dashboard

CSV files are fine for prototypes. Real dashboards pull from real databases. Today you'll connect Streamlit to SQLite (locally) and PostgreSQL (for production), use SQLAlchemy, and cache queries so your app stays fast.

Setting Up a Local SQLite Database

SQLite is perfect for development and small apps. It's a single file, no server required. You can switch to PostgreSQL or MySQL later with just a connection string change.

bash
pip install sqlalchemy
python — create_db.py
import sqlite3
import pandas as pd
import numpy as np

# Create and populate a sample database
conn = sqlite3.connect("sales.db")

np.random.seed(42)
n = 500
df = pd.DataFrame({
    "date": pd.date_range("2024-01-01", periods=n).strftime("%Y-%m-%d"),
    "region": np.random.choice(["North", "South", "East", "West"], n),
    "product": np.random.choice(["Widget A", "Widget B", "Widget C"], n),
    "revenue": np.random.randint(500, 5000, n),
    "units": np.random.randint(1, 50, n)
})

df.to_sql("sales", conn, if_exists="replace", index=False)
conn.close()
print("Database created: sales.db")

Run this once to create your database. You'll query it from the dashboard.

Connecting With SQLAlchemy

SQLAlchemy is the standard Python SQL toolkit. It handles connection pooling, parameterized queries, and works with every major database. The connection string format changes per database but the rest of your code stays the same.

python — connection strings
# SQLite (local file)
DATABASE_URL = "sqlite:///sales.db"

# PostgreSQL
DATABASE_URL = "postgresql://user:password@localhost:5432/mydb"

# PostgreSQL on Supabase / Railway / Render
DATABASE_URL = "postgresql://user:password@host:5432/dbname"

# MySQL
DATABASE_URL = "mysql+pymysql://user:password@localhost/mydb"

Caching Database Queries

Without caching, Streamlit re-queries the database on every user interaction. That's slow and expensive. @st.cache_data stores the result and only re-runs the query when the data actually changes.

python — dashboard.py
import streamlit as st
import pandas as pd
from sqlalchemy import create_engine, text

DATABASE_URL = "sqlite:///sales.db"

@st.cache_resource
def get_engine():
    """Create database engine once and reuse it."""
    return create_engine(DATABASE_URL)

@st.cache_data(ttl=300)  # cache for 5 minutes
def load_sales(region=None):
    """Load sales data, optionally filtered by region."""
    engine = get_engine()
    if region and region != "All":
        query = text("SELECT * FROM sales WHERE region = :region")
        df = pd.read_sql(query, engine, params={"region": region})
    else:
        df = pd.read_sql("SELECT * FROM sales", engine)
    df["date"] = pd.to_datetime(df["date"])
    return df

# UI
st.title("Sales Dashboard")

# Get unique regions for the filter
engine = get_engine()
regions_df = pd.read_sql("SELECT DISTINCT region FROM sales ORDER BY region", engine)
regions = ["All"] + regions_df["region"].tolist()

selected_region = st.selectbox("Region", regions)
df = load_sales(selected_region)

# Metrics row
col1, col2, col3 = st.columns(3)
col1.metric("Total Revenue", f"${df['revenue'].sum():,.0f}")
col2.metric("Total Units", f"{df['units'].sum():,}")
col3.metric("Avg Order Value", f"${df['revenue'].mean():,.0f}")

# Chart
import plotly.express as px
monthly = df.groupby(df["date"].dt.to_period("M").astype(str))["revenue"].sum().reset_index()
monthly.columns = ["month", "revenue"]
fig = px.bar(monthly, x="month", y="revenue", title="Monthly Revenue")
st.plotly_chart(fig, use_container_width=True)
💡
@st.cache_resource vs @st.cache_data: Use cache_resource for shared objects like database connections (created once, shared across users). Use cache_data for data results like query results (cached per unique set of arguments).

Secrets Management

Never hardcode database credentials in your script. Streamlit has a built-in secrets manager:

.streamlit/secrets.toml
[database]
url = "postgresql://user:password@host:5432/mydb"
python — reading secrets
DATABASE_URL = st.secrets["database"]["url"]
engine = create_engine(DATABASE_URL)
⚠️
Add .streamlit/secrets.toml to your .gitignore. Never commit database credentials to version control.

Parameterized Queries — Prevent SQL Injection

Always use parameterized queries when the query includes user input. Never use f-strings or string concatenation to build SQL queries.

python
# WRONG — SQL injection risk
region = st.selectbox("Region", regions)
df = pd.read_sql(f"SELECT * FROM sales WHERE region = '{region}'", engine)

# RIGHT — parameterized
query = text("SELECT * FROM sales WHERE region = :region")
df = pd.read_sql(query, engine, params={"region": region})
📝 Day 3 Exercise
Build a Database-Backed Dashboard
  1. Run the create_db.py script from this lesson to generate sales.db.
  2. Build a Streamlit dashboard that loads data from the database using @st.cache_data.
  3. Add a region filter (selectbox) and a product filter (multiselect). Both should filter the query.
  4. Display total revenue, total units, and number of transactions as st.metric() cards.
  5. Add a bar chart showing revenue by product for the filtered data.
  6. Add a "Refresh Data" button that clears the cache with st.cache_data.clear().

Day 3 Summary

  • SQLAlchemy works with SQLite, PostgreSQL, MySQL — just change the connection string.
  • Use @st.cache_resource for the engine (shared), @st.cache_data(ttl=N) for query results.
  • Store credentials in .streamlit/secrets.toml, never in code.
  • Always parameterize queries that include user input — never use f-strings in SQL.
Challenge

Add a date range filter using st.date_input() and pass the start and end dates as parameters to your SQL query. Then add a "Download CSV" button using st.download_button() that exports the currently filtered data.

Finished this lesson?