Course OverviewFree AI CourseBlogReserve Bootcamp Seat
Python for AI · Day 5 of 5 ~90 minutes

Build an AI-Powered Data Pipeline

Everything comes together today. Read data, analyze it with Claude, save results, and schedule it to run automatically.

1
Day 1
2
Day 2
3
Day 3
4
Day 4
5
Day 5
What You'll Build Today

An end-to-end pipeline that reads all text files from a folder, sends each through Claude for AI analysis, combines the results into a pandas DataFrame, saves a CSV summary, and can be scheduled to run daily — automatically, without you touching it.

1
Architecture

Pipeline Structure

A good pipeline has a clear structure: input → process → output. Each stage is a function. The pipeline function calls them in order.

pythonpipeline.py
import os
import csv
import glob
import pandas as pd
import anthropic
from datetime import datetime

client = anthropic.Anthropic()

def load_files(folder_path):
    """Read all .txt files from a folder."""
    files = glob.glob(os.path.join(folder_path, "*.txt"))
    documents = []
    for filepath in files:
        with open(filepath, "r", encoding="utf-8") as f:
            documents.append({
                "filename": os.path.basename(filepath),
                "content": f.read()
            })
    print(f"Loaded {len(documents)} files")
    return documents

def analyze_with_claude(document):
    """Send a document to Claude and return structured analysis."""
    prompt = f"""Analyze the following document. Return exactly 3 fields:
1. SUMMARY: One sentence summary
2. SENTIMENT: positive / neutral / negative
3. KEY_TOPICS: comma-separated list of 3-5 key topics

Document:
{document['content'][:3000]}"""

    try:
        response = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=512,
            messages=[{"role": "user", "content": prompt}]
        )
        text = response.content[0].text
        return {
            "filename": document["filename"],
            "analysis": text,
            "tokens": response.usage.output_tokens,
            "status": "success"
        }
    except Exception as e:
        return {"filename": document["filename"], "analysis": "", "tokens": 0, "status": f"ERROR: {e}"}

def save_results(results, output_path):
    """Save results to CSV and return a DataFrame."""
    df = pd.DataFrame(results)
    df["processed_at"] = datetime.now().isoformat()
    df.to_csv(output_path, index=False)
    print(f"Saved {len(df)} results to {output_path}")
    return df

def run_pipeline(input_folder="./documents", output_file="results.csv"):
    """Run the full pipeline."""
    print(f"Pipeline started: {datetime.now():%Y-%m-%d %H:%M}")
    documents = load_files(input_folder)
    results = [analyze_with_claude(doc) for doc in documents]
    df = save_results(results, output_file)
    success = df[df["status"] == "success"]
    print(f"Done. {len(success)}/{len(df)} succeeded.")
    return df

if __name__ == "__main__":
    run_pipeline()
2
Automation

Scheduling and Automation

Once your pipeline works, you want it to run on a schedule — without you having to trigger it manually.

Option A: cron (Mac/Linux)

bash
# Open crontab editor
$ crontab -e

# Run pipeline.py every day at 6am
0 6 * * * /usr/bin/python3 /path/to/pipeline.py >> /path/to/pipeline.log 2>&1

Option B: schedule library (cross-platform)

python
# pip install schedule
import schedule
import time

schedule.every().day.at("06:00").do(run_pipeline)

while True:
    schedule.run_pending()
    time.sleep(60)

Where does this go from here? This pipeline pattern is the foundation of every serious AI product. Add a database instead of CSV, add an API endpoint, add a frontend — and you have an AI SaaS product.

Final Exercise

Build Your Pipeline

  • Create a documents/ folder with 3–5 text files (copy any articles, reports, or notes you have)
  • Run the full pipeline from this lesson and verify the CSV output
  • Add a step that reads the results CSV with pandas and prints a summary (total tokens used, success rate, most common sentiment)
  • Schedule it to run once using the schedule library and verify it fires
  • Adapt it to process a type of document from your actual work

What You Learned in 5 Days

  • Python fundamentals: variables, loops, functions, error handling
  • Data structures: lists, dicts, JSON — the language of APIs
  • File I/O: reading and writing text files and CSVs
  • API integration: calling Claude from Python and parsing responses
  • Data analysis: pandas DataFrames, filtering, groupby, matplotlib
  • Pipeline architecture: modular functions, error handling at scale, scheduling
Course Progress
Complete!
Course Complete

You finished Python for AI.

You went from installing Python to building a production AI pipeline in 5 days. That's not beginner work — that's the foundation of every AI product in production today.

Ready to go further?

The live bootcamp takes everything you learned here and builds it into full AI systems over 3 intensive days — with a cohort, real data, and live instruction.

Reserve Your Seat — $1,490