Course OverviewFree AI CourseBlogReserve Bootcamp Seat
Python for AI · Day 4 of 5 ~75 minutes

Data Analysis with Pandas

Load real data into pandas DataFrames, filter and group it, visualize it, and find genuine insights — not toy examples.

1
Day 1
2
Day 2
3
Day 3
4
Day 4
5
Day 5
What You'll Build Today

Load a real-world dataset (a public sales CSV), perform filtering and grouping analysis on it, build two visualizations with matplotlib, and write out 3 genuine insights as a summary. The workflow data analysts do manually every day — automated.

1
Setup

Installing pandas and matplotlib

bash
$ pip install pandas matplotlib

Then grab a dataset to work with. The Superstore sales dataset is a classic — it has orders, regions, categories, and revenue, which makes it ideal for practicing analysis.

pythonload_data.py
import pandas as pd

# Load a CSV into a DataFrame
df = pd.read_csv("superstore.csv")

# First look at the data
print(df.head())       # first 5 rows
print(df.shape)        # (rows, columns)
print(df.columns.tolist())  # column names
print(df.dtypes)       # data type of each column
print(df.describe())   # stats: mean, std, min, max

# Check for missing values
print(df.isnull().sum())
2
Filter & Group

Filtering, Grouping, and Aggregating

pythonanalysis.py
import pandas as pd

df = pd.read_csv("superstore.csv")

# Filter rows
tech_orders = df[df["Category"] == "Technology"]
big_orders = df[df["Sales"] > 1000]

# Multiple conditions
west_tech = df[(df["Region"] == "West") & (df["Category"] == "Technology")]

# GroupBy: total sales by region
regional_sales = df.groupby("Region")["Sales"].sum().sort_values(ascending=False)
print(regional_sales)

# Multiple aggregations
summary = df.groupby("Category").agg({
    "Sales": ["sum", "mean"],
    "Profit": "sum",
    "Order ID": "count"
})
print(summary)

# Top 10 most profitable products
top_products = df.groupby("Product Name")["Profit"].sum().nlargest(10)
print(top_products)

# Profit margin by segment
df["profit_margin"] = df["Profit"] / df["Sales"]
print(df.groupby("Segment")["profit_margin"].mean())
3
Visualize

Basic Visualization with Matplotlib

pythoncharts.py
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("superstore.csv")

# Bar chart: sales by category
category_sales = df.groupby("Category")["Sales"].sum()
plt.figure(figsize=(8, 5))
category_sales.plot(kind="bar", color="#1e3a5f")
plt.title("Total Sales by Category")
plt.xlabel("Category")
plt.ylabel("Sales ($)")
plt.xticks(rotation=0)
plt.tight_layout()
plt.savefig("sales_by_category.png")
plt.show()

# Scatter: sales vs profit
plt.figure(figsize=(8, 5))
plt.scatter(df["Sales"], df["Profit"], alpha=0.3, color="#c4873e")
plt.title("Sales vs Profit")
plt.xlabel("Sales ($)")
plt.ylabel("Profit ($)")
plt.tight_layout()
plt.savefig("sales_vs_profit.png")
plt.show()

Tomorrow: You'll replace the manual "find insights" step with Claude. Instead of staring at charts, you'll feed the numbers to the API and get written analysis back automatically.

Today's Exercise

Analyze a Real Dataset — Find 3 Insights

  • Download the Superstore CSV from Kaggle (free account needed) or use any CSV you have
  • Load it with pandas and run .describe() to understand the shape
  • Write 3 groupby queries that each answer a specific business question
  • Build 2 charts using matplotlib and save them as PNG files
  • Write your 3 insights as Python comments next to the code that revealed them

What You Learned Today

  • Loading CSVs into pandas DataFrames and inspecting their structure
  • Filtering rows with boolean conditions and multiple conditions
  • Groupby + aggregate: sum, mean, count, nlargest — the core analysis toolkit
  • Building bar charts and scatter plots with matplotlib
  • Creating new computed columns from existing ones
Course Progress
Day 4 of 5 — 80%
Day 4 Complete

Day 5: Build an AI-Powered Data Pipeline

Tomorrow you combine everything — files, APIs, pandas, and Claude — into one end-to-end pipeline that runs itself.

Start Day 5