Course Overview All Courses Blog Reserve Bootcamp Seat
Excel to AI · Day 3 of 5 ~90 minutes

Pandas — Excel But 100x More Powerful

The library every data professional uses. Every Excel operation you know, but faster, scriptable, and capable of handling millions of rows without breaking a sweat.

Day 1
Day 2
3
Day 3
4
Day 4
5
Day 5
What You'll Build Today

A full sales analysis script using pandas: load data, filter by region, group by rep, calculate totals and averages, merge with a quota table, and export a formatted Excel report — all in under 30 lines of code.

1
Install

Install Pandas

bash
$ pip install pandas openpyxl xlsxwriter

Pandas is the data manipulation library. openpyxl and xlsxwriter handle Excel file I/O. You need all three.

2
Core Concepts

DataFrames — Your New Spreadsheet

A pandas DataFrame is a table with named columns. It's exactly what Excel is, but in code.

pythonpandas_basics.py
import pandas as pd

# Read files — like File > Open in Excel
df = pd.read_csv("sales.csv")
df = pd.read_excel("sales.xlsx")

# Inspect — like Ctrl+End to see size
print(df.shape)         # (rows, columns)
print(df.columns)      # column names
print(df.head())        # first 5 rows
print(df.describe())   # stats summary

# Filter rows — like Excel AutoFilter
east = df[df["region"] == "East"]
big_deals = df[df["revenue"] > 50000]
east_big = df[(df["region"] == "East") & (df["revenue"] > 50000)]

# Select columns — like hiding columns
names_revenue = df[["rep", "revenue"]]
3
Aggregation

GroupBy — Pivot Tables Done Right

pythongroupby.py
import pandas as pd

df = pd.read_csv("sales.csv")

# Total revenue by region — like a pivot table SUM
by_region = df.groupby("region")["revenue"].sum()
print(by_region)

# Multiple metrics at once
summary = df.groupby("rep").agg({
    "revenue": ["sum", "mean", "count"],
    "deals": "sum"
}).round(0)

# Sort by total revenue descending
summary = summary.sort_values(("revenue", "sum"), ascending=False)
print(summary)

# Monthly trend
df["month"] = pd.to_datetime(df["date"]).dt.to_period("M")
monthly = df.groupby("month")["revenue"].sum()
print(monthly)
4
Merging

Merge — The VLOOKUP That Actually Works

VLOOKUP breaks on duplicate values, returns only one column at a time, and breaks when you insert columns. Pandas merge does none of those things.

pythonmerge.py
import pandas as pd

sales = pd.read_csv("sales.csv")
quotas = pd.read_csv("quotas.csv")  # rep, quota

# Like VLOOKUP but better in every way
merged = sales.merge(quotas, on="rep", how="left")

# Calculate attainment
merged["attainment"] = (merged["revenue"] / merged["quota"] * 100).round(1)

# Who's above quota?
above_quota = merged[merged["attainment"] >= 100]
print(f"{len(above_quota)} reps at or above quota")

# Export to Excel with formatting
merged.to_excel("quota_analysis.xlsx", index=False)
print("Saved quota_analysis.xlsx")

The how="left" parameter keeps all sales rows even if a rep isn't in the quota table. That's the same as VLOOKUP returning N/A — but here you control the behavior precisely.

Day 3 Complete

  • Load CSV and Excel files into pandas DataFrames
  • Filter rows with conditions more powerful than Excel AutoFilter
  • Aggregate data with groupby — faster than any pivot table
  • Merge datasets the right way — no more VLOOKUP fragility
  • Export results back to Excel with to_excel()
Day 3 Done

Tomorrow: Add AI to Your Spreadsheets

Day 4 shows you how to feed real spreadsheet data to Claude and get instant written analysis.

Day 4: Add AI Analysis
Finished this lesson?