CSV files are fine for prototypes. Real dashboards pull from real databases. Today you'll connect Streamlit to SQLite (locally) and PostgreSQL (for production), use SQLAlchemy, and cache queries so your app stays fast.
SQLite is perfect for development and small apps. It's a single file, no server required. You can switch to PostgreSQL or MySQL later with just a connection string change.
pip install sqlalchemy
import sqlite3
import pandas as pd
import numpy as np
# Create and populate a sample database
conn = sqlite3.connect("sales.db")
np.random.seed(42)
n = 500
df = pd.DataFrame({
"date": pd.date_range("2024-01-01", periods=n).strftime("%Y-%m-%d"),
"region": np.random.choice(["North", "South", "East", "West"], n),
"product": np.random.choice(["Widget A", "Widget B", "Widget C"], n),
"revenue": np.random.randint(500, 5000, n),
"units": np.random.randint(1, 50, n)
})
df.to_sql("sales", conn, if_exists="replace", index=False)
conn.close()
print("Database created: sales.db")
Run this once to create your database. You'll query it from the dashboard.
SQLAlchemy is the standard Python SQL toolkit. It handles connection pooling, parameterized queries, and works with every major database. The connection string format changes per database but the rest of your code stays the same.
# SQLite (local file)
DATABASE_URL = "sqlite:///sales.db"
# PostgreSQL
DATABASE_URL = "postgresql://user:password@localhost:5432/mydb"
# PostgreSQL on Supabase / Railway / Render
DATABASE_URL = "postgresql://user:password@host:5432/dbname"
# MySQL
DATABASE_URL = "mysql+pymysql://user:password@localhost/mydb"
Without caching, Streamlit re-queries the database on every user interaction. That's slow and expensive. @st.cache_data stores the result and only re-runs the query when the data actually changes.
import streamlit as st
import pandas as pd
from sqlalchemy import create_engine, text
DATABASE_URL = "sqlite:///sales.db"
@st.cache_resource
def get_engine():
"""Create database engine once and reuse it."""
return create_engine(DATABASE_URL)
@st.cache_data(ttl=300) # cache for 5 minutes
def load_sales(region=None):
"""Load sales data, optionally filtered by region."""
engine = get_engine()
if region and region != "All":
query = text("SELECT * FROM sales WHERE region = :region")
df = pd.read_sql(query, engine, params={"region": region})
else:
df = pd.read_sql("SELECT * FROM sales", engine)
df["date"] = pd.to_datetime(df["date"])
return df
# UI
st.title("Sales Dashboard")
# Get unique regions for the filter
engine = get_engine()
regions_df = pd.read_sql("SELECT DISTINCT region FROM sales ORDER BY region", engine)
regions = ["All"] + regions_df["region"].tolist()
selected_region = st.selectbox("Region", regions)
df = load_sales(selected_region)
# Metrics row
col1, col2, col3 = st.columns(3)
col1.metric("Total Revenue", f"${df['revenue'].sum():,.0f}")
col2.metric("Total Units", f"{df['units'].sum():,}")
col3.metric("Avg Order Value", f"${df['revenue'].mean():,.0f}")
# Chart
import plotly.express as px
monthly = df.groupby(df["date"].dt.to_period("M").astype(str))["revenue"].sum().reset_index()
monthly.columns = ["month", "revenue"]
fig = px.bar(monthly, x="month", y="revenue", title="Monthly Revenue")
st.plotly_chart(fig, use_container_width=True)
@st.cache_resource vs @st.cache_data: Use cache_resource for shared objects like database connections (created once, shared across users). Use cache_data for data results like query results (cached per unique set of arguments).Never hardcode database credentials in your script. Streamlit has a built-in secrets manager:
[database]
url = "postgresql://user:password@host:5432/mydb"
DATABASE_URL = st.secrets["database"]["url"]
engine = create_engine(DATABASE_URL)
.streamlit/secrets.toml to your .gitignore. Never commit database credentials to version control.Always use parameterized queries when the query includes user input. Never use f-strings or string concatenation to build SQL queries.
# WRONG — SQL injection risk
region = st.selectbox("Region", regions)
df = pd.read_sql(f"SELECT * FROM sales WHERE region = '{region}'", engine)
# RIGHT — parameterized
query = text("SELECT * FROM sales WHERE region = :region")
df = pd.read_sql(query, engine, params={"region": region})
create_db.py script from this lesson to generate sales.db.@st.cache_data.st.metric() cards.st.cache_data.clear().@st.cache_resource for the engine (shared), @st.cache_data(ttl=N) for query results..streamlit/secrets.toml, never in code.Add a date range filter using st.date_input() and pass the start and end dates as parameters to your SQL query. Then add a "Download CSV" button using st.download_button() that exports the currently filtered data.