A full sales analysis script using pandas: load data, filter by region, group by rep, calculate totals and averages, merge with a quota table, and export a formatted Excel report — all in under 30 lines of code.
Install Pandas
$ pip install pandas openpyxl xlsxwriter
Pandas is the data manipulation library. openpyxl and xlsxwriter handle Excel file I/O. You need all three.
DataFrames — Your New Spreadsheet
A pandas DataFrame is a table with named columns. It's exactly what Excel is, but in code.
import pandas as pd
# Read files — like File > Open in Excel
df = pd.read_csv("sales.csv")
df = pd.read_excel("sales.xlsx")
# Inspect — like Ctrl+End to see size
print(df.shape) # (rows, columns)
print(df.columns) # column names
print(df.head()) # first 5 rows
print(df.describe()) # stats summary
# Filter rows — like Excel AutoFilter
east = df[df["region"] == "East"]
big_deals = df[df["revenue"] > 50000]
east_big = df[(df["region"] == "East") & (df["revenue"] > 50000)]
# Select columns — like hiding columns
names_revenue = df[["rep", "revenue"]]
GroupBy — Pivot Tables Done Right
import pandas as pd
df = pd.read_csv("sales.csv")
# Total revenue by region — like a pivot table SUM
by_region = df.groupby("region")["revenue"].sum()
print(by_region)
# Multiple metrics at once
summary = df.groupby("rep").agg({
"revenue": ["sum", "mean", "count"],
"deals": "sum"
}).round(0)
# Sort by total revenue descending
summary = summary.sort_values(("revenue", "sum"), ascending=False)
print(summary)
# Monthly trend
df["month"] = pd.to_datetime(df["date"]).dt.to_period("M")
monthly = df.groupby("month")["revenue"].sum()
print(monthly)
Merge — The VLOOKUP That Actually Works
VLOOKUP breaks on duplicate values, returns only one column at a time, and breaks when you insert columns. Pandas merge does none of those things.
import pandas as pd
sales = pd.read_csv("sales.csv")
quotas = pd.read_csv("quotas.csv") # rep, quota
# Like VLOOKUP but better in every way
merged = sales.merge(quotas, on="rep", how="left")
# Calculate attainment
merged["attainment"] = (merged["revenue"] / merged["quota"] * 100).round(1)
# Who's above quota?
above_quota = merged[merged["attainment"] >= 100]
print(f"{len(above_quota)} reps at or above quota")
# Export to Excel with formatting
merged.to_excel("quota_analysis.xlsx", index=False)
print("Saved quota_analysis.xlsx")
The how="left" parameter keeps all sales rows even if a rep isn't in the quota table. That's the same as VLOOKUP returning N/A — but here you control the behavior precisely.
Day 3 Complete
- Load CSV and Excel files into pandas DataFrames
- Filter rows with conditions more powerful than Excel AutoFilter
- Aggregate data with groupby — faster than any pivot table
- Merge datasets the right way — no more VLOOKUP fragility
- Export results back to Excel with
to_excel()
Tomorrow: Add AI to Your Spreadsheets
Day 4 shows you how to feed real spreadsheet data to Claude and get instant written analysis.
Day 4: Add AI Analysis