An end-to-end pipeline that reads all text files from a folder, sends each through Claude for AI analysis, combines the results into a pandas DataFrame, saves a CSV summary, and can be scheduled to run daily — automatically, without you touching it.
Pipeline Structure
A good pipeline has a clear structure: input → process → output. Each stage is a function. The pipeline function calls them in order.
import os
import csv
import glob
import pandas as pd
import anthropic
from datetime import datetime
client = anthropic.Anthropic()
def load_files(folder_path):
"""Read all .txt files from a folder."""
files = glob.glob(os.path.join(folder_path, "*.txt"))
documents = []
for filepath in files:
with open(filepath, "r", encoding="utf-8") as f:
documents.append({
"filename": os.path.basename(filepath),
"content": f.read()
})
print(f"Loaded {len(documents)} files")
return documents
def analyze_with_claude(document):
"""Send a document to Claude and return structured analysis."""
prompt = f"""Analyze the following document. Return exactly 3 fields:
1. SUMMARY: One sentence summary
2. SENTIMENT: positive / neutral / negative
3. KEY_TOPICS: comma-separated list of 3-5 key topics
Document:
{document['content'][:3000]}"""
try:
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=512,
messages=[{"role": "user", "content": prompt}]
)
text = response.content[0].text
return {
"filename": document["filename"],
"analysis": text,
"tokens": response.usage.output_tokens,
"status": "success"
}
except Exception as e:
return {"filename": document["filename"], "analysis": "", "tokens": 0, "status": f"ERROR: {e}"}
def save_results(results, output_path):
"""Save results to CSV and return a DataFrame."""
df = pd.DataFrame(results)
df["processed_at"] = datetime.now().isoformat()
df.to_csv(output_path, index=False)
print(f"Saved {len(df)} results to {output_path}")
return df
def run_pipeline(input_folder="./documents", output_file="results.csv"):
"""Run the full pipeline."""
print(f"Pipeline started: {datetime.now():%Y-%m-%d %H:%M}")
documents = load_files(input_folder)
results = [analyze_with_claude(doc) for doc in documents]
df = save_results(results, output_file)
success = df[df["status"] == "success"]
print(f"Done. {len(success)}/{len(df)} succeeded.")
return df
if __name__ == "__main__":
run_pipeline()
Scheduling and Automation
Once your pipeline works, you want it to run on a schedule — without you having to trigger it manually.
Option A: cron (Mac/Linux)
# Open crontab editor
$ crontab -e
# Run pipeline.py every day at 6am
0 6 * * * /usr/bin/python3 /path/to/pipeline.py >> /path/to/pipeline.log 2>&1
Option B: schedule library (cross-platform)
# pip install schedule
import schedule
import time
schedule.every().day.at("06:00").do(run_pipeline)
while True:
schedule.run_pending()
time.sleep(60)
Where does this go from here? This pipeline pattern is the foundation of every serious AI product. Add a database instead of CSV, add an API endpoint, add a frontend — and you have an AI SaaS product.
Build Your Pipeline
- Create a
documents/folder with 3–5 text files (copy any articles, reports, or notes you have) - Run the full pipeline from this lesson and verify the CSV output
- Add a step that reads the results CSV with pandas and prints a summary (total tokens used, success rate, most common sentiment)
- Schedule it to run once using the
schedulelibrary and verify it fires - Adapt it to process a type of document from your actual work
What You Learned in 5 Days
- Python fundamentals: variables, loops, functions, error handling
- Data structures: lists, dicts, JSON — the language of APIs
- File I/O: reading and writing text files and CSVs
- API integration: calling Claude from Python and parsing responses
- Data analysis: pandas DataFrames, filtering, groupby, matplotlib
- Pipeline architecture: modular functions, error handling at scale, scheduling
You finished Python for AI.
You went from installing Python to building a production AI pipeline in 5 days. That's not beginner work — that's the foundation of every AI product in production today.
Ready to go further?
The live bootcamp takes everything you learned here and builds it into full AI systems over 3 intensive days — with a cohort, real data, and live instruction.
Reserve Your Seat — $1,490