Course Overview All Courses Blog Reserve Bootcamp Seat
SQL for AI · Day 5 of 5 ~90 minutes

Build a Data Pipeline — SQL + Python + AI

Wire everything together. Query a SQLite database from Python, transform the data with SQL, feed the results to Claude for analysis. A complete, working pipeline you can show in your portfolio.

Day 1
Day 2
Day 3
Day 4
5
Day 5
What You'll Build Today

A Python script that queries your sales database, generates a revenue summary using SQL, and calls Claude to produce a written analysis — all in under 80 lines of code. This is what a data pipeline looks like in the real world.

1
Setup

Install dependencies

You need two packages: anthropic (Claude's Python library) and pandas (data manipulation). SQLite3 is built into Python — nothing to install.

bash
pip install anthropic pandas

Get your Anthropic API key from console.anthropic.com. Set it as an environment variable:

bash
# Mac / Linux
export ANTHROPIC_API_KEY="sk-ant-..."

# Windows (Command Prompt)
set ANTHROPIC_API_KEY=sk-ant-...
2
Query the Database

Python + SQLite3 — Querying Programmatically

Python's built-in sqlite3 module connects to any SQLite database file. You can run any SQL query and get results back as a list of tuples or a pandas DataFrame.

pythonquery_db.py
import sqlite3
import pandas as pd

# Connect to the database
conn = sqlite3.connect("sales.db")

# Run a SQL query — get results as a DataFrame
query = """
SELECT
  c.name AS customer,
  c.segment,
  COUNT(*) AS orders,
  SUM(o.amount) AS total_revenue,
  AVG(o.amount) AS avg_order
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'completed'
GROUP BY c.name, c.segment
ORDER BY total_revenue DESC
"""

df = pd.read_sql_query(query, conn)
conn.close()

print(df.to_string(index=False))

Run it: python query_db.py. You'll see a formatted table of customer revenue data in your terminal.

3
Add AI Analysis

Feed the Data to Claude

Now the interesting part — take the query results and ask Claude to analyze them. The pipeline: query → format as text → send to Claude → print the analysis.

pythonpipeline.py
import sqlite3
import pandas as pd
import anthropic
import os

# 1. Query the database
conn = sqlite3.connect("sales.db")
query = """
SELECT
  c.name, c.segment,
  COUNT(*) AS orders,
  ROUND(SUM(o.amount), 2) AS total_revenue,
  ROUND(AVG(o.amount), 2) AS avg_order
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'completed'
GROUP BY c.name, c.segment
ORDER BY total_revenue DESC
"""
df = pd.read_sql_query(query, conn)
conn.close()

# 2. Format as a clean text table
data_text = df.to_string(index=False)
total = df['total_revenue'].sum()

# 3. Send to Claude for analysis
client = anthropic.Anthropic()

prompt = f"""Here is a sales revenue summary from our database:

{data_text}

Total revenue: ${total:,.2f}

Please analyze this data in 3-4 sentences. Identify:
- Who the top customer is and their share of revenue
- Any patterns in the segment data
- One actionable recommendation for the sales team"""

message = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=512,
    messages=[{"role": "user", "content": prompt}]
)

# 4. Print the analysis
print("\n=== DATA ===\n")
print(data_text)
print(f"\nTotal revenue: ${total:,.2f}")
print("\n=== AI ANALYSIS ===\n")
print(message.content[0].text)

claude-3-5-haiku is Claude's fastest and cheapest model — perfect for pipelines that run frequently. For deeper analysis, swap in claude-3-5-sonnet-20241022.

4
Make It Reusable

Wrap it in a function and schedule it

Production pipelines don't run once. They run on a schedule — daily, weekly, or triggered by an event. Here's a clean version you can extend:

pythonpipeline_v2.py
def run_sales_report(db_path: str, query: str, prompt_template: str) -> str:
    """Query a SQLite DB, format results, get AI analysis. Returns the analysis."""
    import sqlite3, pandas as pd, anthropic

    # Query
    conn = sqlite3.connect(db_path)
    df = pd.read_sql_query(query, conn)
    conn.close()

    # Analyze
    client = anthropic.Anthropic()
    prompt = prompt_template.format(data=df.to_string(index=False))
    msg = client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=512,
        messages=[{"role": "user", "content": prompt}]
    )
    return msg.content[0].text


# Use it
if __name__ == "__main__":
    analysis = run_sales_report(
        db_path="sales.db",
        query="SELECT name, SUM(amount) AS total FROM orders o JOIN customers c ON o.customer_id=c.id GROUP BY name ORDER BY total DESC",
        prompt_template="Analyze this sales data in 3 sentences:\n\n{data}"
    )
    print(analysis)

To schedule this, use cron on Mac/Linux or Windows Task Scheduler. Or deploy it as a GitHub Action that runs on a schedule.

Day 5 Complete — What You Built

  • Connected Python to SQLite with sqlite3 and pandas
  • Ran parameterized SQL queries programmatically
  • Fed query results to the Claude API for AI analysis
  • Wrapped the pipeline in a reusable function
  • A complete portfolio project: SQL + Python + AI
Course progress
100%
Course Complete

You finished SQL for AI and Data

You went from your first SELECT to building an AI-powered data pipeline. That's 5 days of real, usable SQL skill. Where you go next is up to you.

Finished this lesson?