Reading Files with Python
Before you can ask Claude to summarize a document, you need to get that document into Python as a string. Here's the only pattern you need to know.
# Read a plain text file
with open("report.txt", "r") as f:
content = f.read()
print(f"Read {len(content)} characters")
Line by Line
open("report.txt", "r")
Opens the file in read mode. The with block automatically closes the file when done — no cleanup needed.
f.read()
Reads the entire file into a single string. For large files, you could use f.readlines() to get a list of lines instead.
For CSV files, use Python's built-in csv module:
import csv
with open("data.csv", "r", encoding="utf-8") as f:
reader = csv.DictReader(f)
rows = list(reader)
# Each row is a dict: {"column_name": "value", ...}
print(f"Loaded {len(rows)} rows")
print(rows[0]) # First row as dict
Always specify encoding="utf-8" when reading files. If you get a UnicodeDecodeError, try encoding="latin-1" as a fallback. Most modern files are UTF-8.
Sending Documents to Claude
Now combine file reading with the Claude API. This is your first real document summarizer — the kind of tool you can hand to anyone in your office and they'll immediately understand its value.
import anthropic
client = anthropic.Anthropic()
# Step 1: Read the document
with open("report.txt", "r") as f:
document = f.read()
# Step 2: Send to Claude with a strong system prompt
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
system="You are a professional analyst. Summarize documents with precision. Include key findings, important numbers, and actionable recommendations.",
messages=[
{"role": "user", "content": f"Summarize this document in 5 bullet points:\n\n{document}"}
]
)
print(message.content[0].text)
What's New Here
system=
The system prompt shapes Claude's persona and output style. "Professional analyst" produces tighter, more structured output than no system prompt at all. This parameter is optional but important.
f"...{document}"
Python f-string interpolation. The document content gets inserted directly into the prompt string. Claude reads it as if you typed the whole document yourself.
max_tokens=2048
Maximum length of Claude's response. 2048 tokens ≈ 1,500 words. For summaries, 512–1024 is usually enough. Set it higher for long-form outputs.
Create a report.txt in the same folder with any text — paste in a Wikipedia article, a meeting transcript, an email thread. Run the script. That's your summarizer working.
Structured Data Extraction
Summarization is useful. Extraction is powerful. Instead of prose, you tell Claude to return JSON — and now you have machine-readable data you can store, query, and pipe into other systems.
Here's an invoice extractor. The same pattern works for any document: contracts, resumes, lab reports, financial filings.
import anthropic
import json
client = anthropic.Anthropic()
with open("invoice.txt", "r") as f:
invoice = f.read()
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": f"""Extract the following fields from this invoice as JSON:
- vendor_name
- invoice_number
- date
- line_items (array of {{description, quantity, unit_price, total}})
- subtotal
- tax
- total_due
Invoice:
{invoice}
Return ONLY valid JSON."""}
]
)
# Parse the JSON response
data = json.loads(message.content[0].text)
print(json.dumps(data, indent=2))
print(f"\nTotal due: ${data['total_due']}")
Adding "Return ONLY valid JSON" to your prompt stops Claude from wrapping the response in explanation text. Without it, Claude might write "Here's the JSON:" before the data, which breaks json.loads(). This one phrase saves you a lot of parsing headaches.
To test this, create an invoice.txt file with realistic invoice content — vendor name, items, prices. Claude handles messy real-world formatting remarkably well. It can extract fields from invoices where the data isn't in a consistent location, making this far more robust than regex.
The json.loads() call will raise an error if Claude's response isn't valid JSON. In production code, wrap it in a try/except block and retry once if it fails. Claude rarely returns invalid JSON when you're explicit, but adding a retry loop makes your code bulletproof.
Processing Multiple Files
This is where the leverage appears. One document is convenient. A hundred documents processed automatically is a workflow transformation. This script loops through a folder, summarizes every file, and writes the results to a CSV.
import anthropic
import os
import csv
client = anthropic.Anthropic()
results = []
for filename in os.listdir("documents"):
if not filename.endswith(".txt"):
continue
with open(os.path.join("documents", filename), "r") as f:
content = f.read()
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=512,
messages=[
{"role": "user", "content": f"In one sentence, what is the main point of this document?\n\n{content}"}
]
)
summary = message.content[0].text
results.append({"file": filename, "summary": summary})
print(f"Processed: {filename}")
# Write results to CSV
with open("summaries.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["file", "summary"])
writer.writeheader()
writer.writerows(results)
print(f"\nDone. {len(results)} documents summarized. Results in summaries.csv")
How It Works
os.listdir("documents")
Lists every file in the documents/ folder. The if not filename.endswith(".txt") check skips non-text files.
os.path.join(...)
Builds the full file path correctly on any OS. Never concatenate paths with + — this handles Windows backslashes automatically.
csv.DictWriter
Writes a list of dicts to CSV. writeheader() writes the column names. writerows() writes all the data rows at once.
You just automated hours of document review into a script that runs in minutes. This is what AI looks like in the real world — not chatting, but processing. A folder with 200 reports becomes a spreadsheet of one-sentence summaries in under 10 minutes of compute time.
What You Built Today
- Read text and CSV files into Python strings
- Built a document summarizer using the system prompt parameter
- Extracted structured JSON data from unstructured documents
- Batch-processed an entire folder of files into a CSV report
- Used
os.listdir()andcsv.DictWriterfor file system operations
Build a Resume Screener
Take the batch processor from Section 4 and adapt it to extract structured data from resumes instead of writing one-sentence summaries. Run it against a folder of .txt resume files and write the results to a spreadsheet.
- Read each resume file from a
resumes/folder - Ask Claude to extract: name, email, years of experience, top 3 skills
- Tell Claude to "Return ONLY valid JSON"
- Parse the JSON and write all results to
candidates.csv - Bonus: add a column for "fit score" — prompt Claude to rate 1–10 fit for a specific role