Curriculum Cities Blog Pricing Reserve Your Seat
Day 2 of 5 ~50 minutes

Make AI Read Your Documents

Yesterday you sent your first message to Claude. Today you go from a one-liner to a real tool: a Python script that reads files from disk, feeds them to Claude, and returns summaries, structured data, and batch reports. This is the kind of code people pay consultants $300/hour to write.

Day 1
2
Day 2
3
Day 3
4
Day 4
5
Day 5
1
10 minutes

Reading Files with Python

Before you can ask Claude to summarize a document, you need to get that document into Python as a string. Here's the only pattern you need to know.

read_text.py
# Read a plain text file
with open("report.txt", "r") as f:
    content = f.read()

print(f"Read {len(content)} characters")

Line by Line

open("report.txt", "r")

Opens the file in read mode. The with block automatically closes the file when done — no cleanup needed.

f.read()

Reads the entire file into a single string. For large files, you could use f.readlines() to get a list of lines instead.

For CSV files, use Python's built-in csv module:

read_csv.py
import csv

with open("data.csv", "r", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    rows = list(reader)

# Each row is a dict: {"column_name": "value", ...}
print(f"Loaded {len(rows)} rows")
print(rows[0])  # First row as dict
Encoding Note

Always specify encoding="utf-8" when reading files. If you get a UnicodeDecodeError, try encoding="latin-1" as a fallback. Most modern files are UTF-8.

2
10 minutes

Sending Documents to Claude

Now combine file reading with the Claude API. This is your first real document summarizer — the kind of tool you can hand to anyone in your office and they'll immediately understand its value.

summarize.py
import anthropic

client = anthropic.Anthropic()

# Step 1: Read the document
with open("report.txt", "r") as f:
    document = f.read()

# Step 2: Send to Claude with a strong system prompt
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    system="You are a professional analyst. Summarize documents with precision. Include key findings, important numbers, and actionable recommendations.",
    messages=[
        {"role": "user", "content": f"Summarize this document in 5 bullet points:\n\n{document}"}
    ]
)

print(message.content[0].text)

What's New Here

system=

The system prompt shapes Claude's persona and output style. "Professional analyst" produces tighter, more structured output than no system prompt at all. This parameter is optional but important.

f"...{document}"

Python f-string interpolation. The document content gets inserted directly into the prompt string. Claude reads it as if you typed the whole document yourself.

max_tokens=2048

Maximum length of Claude's response. 2048 tokens ≈ 1,500 words. For summaries, 512–1024 is usually enough. Set it higher for long-form outputs.

Create a report.txt in the same folder with any text — paste in a Wikipedia article, a meeting transcript, an email thread. Run the script. That's your summarizer working.

3
15 minutes

Structured Data Extraction

Summarization is useful. Extraction is powerful. Instead of prose, you tell Claude to return JSON — and now you have machine-readable data you can store, query, and pipe into other systems.

Here's an invoice extractor. The same pattern works for any document: contracts, resumes, lab reports, financial filings.

extract_invoice.py
import anthropic
import json

client = anthropic.Anthropic()

with open("invoice.txt", "r") as f:
    invoice = f.read()

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": f"""Extract the following fields from this invoice as JSON:
- vendor_name
- invoice_number
- date
- line_items (array of {{description, quantity, unit_price, total}})
- subtotal
- tax
- total_due

Invoice:
{invoice}

Return ONLY valid JSON."""}
    ]
)

# Parse the JSON response
data = json.loads(message.content[0].text)
print(json.dumps(data, indent=2))
print(f"\nTotal due: ${data['total_due']}")
The "Return ONLY valid JSON" Pattern

Adding "Return ONLY valid JSON" to your prompt stops Claude from wrapping the response in explanation text. Without it, Claude might write "Here's the JSON:" before the data, which breaks json.loads(). This one phrase saves you a lot of parsing headaches.

To test this, create an invoice.txt file with realistic invoice content — vendor name, items, prices. Claude handles messy real-world formatting remarkably well. It can extract fields from invoices where the data isn't in a consistent location, making this far more robust than regex.

💡

The json.loads() call will raise an error if Claude's response isn't valid JSON. In production code, wrap it in a try/except block and retry once if it fails. Claude rarely returns invalid JSON when you're explicit, but adding a retry loop makes your code bulletproof.

4
15 minutes

Processing Multiple Files

This is where the leverage appears. One document is convenient. A hundred documents processed automatically is a workflow transformation. This script loops through a folder, summarizes every file, and writes the results to a CSV.

batch_summarize.py
import anthropic
import os
import csv

client = anthropic.Anthropic()
results = []

for filename in os.listdir("documents"):
    if not filename.endswith(".txt"):
        continue

    with open(os.path.join("documents", filename), "r") as f:
        content = f.read()

    message = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=512,
        messages=[
            {"role": "user", "content": f"In one sentence, what is the main point of this document?\n\n{content}"}
        ]
    )

    summary = message.content[0].text
    results.append({"file": filename, "summary": summary})
    print(f"Processed: {filename}")

# Write results to CSV
with open("summaries.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["file", "summary"])
    writer.writeheader()
    writer.writerows(results)

print(f"\nDone. {len(results)} documents summarized. Results in summaries.csv")

How It Works

os.listdir("documents")

Lists every file in the documents/ folder. The if not filename.endswith(".txt") check skips non-text files.

os.path.join(...)

Builds the full file path correctly on any OS. Never concatenate paths with + — this handles Windows backslashes automatically.

csv.DictWriter

Writes a list of dicts to CSV. writeheader() writes the column names. writerows() writes all the data rows at once.

You just automated hours of document review into a script that runs in minutes. This is what AI looks like in the real world — not chatting, but processing. A folder with 200 reports becomes a spreadsheet of one-sentence summaries in under 10 minutes of compute time.

What You Built Today

  • Read text and CSV files into Python strings
  • Built a document summarizer using the system prompt parameter
  • Extracted structured JSON data from unstructured documents
  • Batch-processed an entire folder of files into a CSV report
  • Used os.listdir() and csv.DictWriter for file system operations
Your Challenge

Build a Resume Screener

Take the batch processor from Section 4 and adapt it to extract structured data from resumes instead of writing one-sentence summaries. Run it against a folder of .txt resume files and write the results to a spreadsheet.

  • Read each resume file from a resumes/ folder
  • Ask Claude to extract: name, email, years of experience, top 3 skills
  • Tell Claude to "Return ONLY valid JSON"
  • Parse the JSON and write all results to candidates.csv
  • Bonus: add a column for "fit score" — prompt Claude to rate 1–10 fit for a specific role
Day 1: Your First API Call

Course progress

Day 3: Build an AI Agent