A sales data loader that reads a CSV, calculates total and average revenue by region, filters for top performers, and prints a clean summary — all in about 15 lines of Python. We'll compare this to doing the same thing in Excel.
Why analysts are making the switch
Excel is great. It's visual, fast to start, and everyone knows it. But it has three problems that Python solves:
Problem 1: Scale. Excel slows to a crawl above 100K rows. Python handles millions of rows in seconds.
Problem 2: Reproducibility. When your boss asks "how did you get this number?" in Excel, you trace 14 formulas. In Python, you show them the code.
Problem 3: Automation. Excel requires a human every time. Python can run the same analysis on every new dataset without you.
This course is not about replacing your Excel skills. It's about adding a new layer on top of them. Your domain knowledge of what the data means is irreplaceable — Python just removes the grunt work.
Install Python and pandas
Step 1: Install Python
Go to python.org/downloads and download Python 3.11 or later. Run the installer. On Windows, check the box that says "Add Python to PATH" before clicking Install.
Verify it worked by opening your terminal (Command Prompt on Windows, Terminal on Mac) and typing:
python --version
You should see something like Python 3.11.4. If it says "command not found," restart your terminal and try again.
Step 2: Install pandas
In your terminal, run:
pip install pandas
This installs pandas — the Python library that works like Excel but faster and scriptable. You only do this once.
Step 3: Get a code editor
Download VS Code from code.visualstudio.com — free. Install the Python extension when prompted. This is where you'll write your scripts for the rest of the course.
Create a sample dataset
We'll create a simple sales CSV to work with. Create a new folder called data-analysis-course, open it in VS Code, and create a file called sales.csv with this content:
region,rep,product,revenue,units
West,Alice,Software,42000,14
East,Bob,Hardware,38000,9
West,Carol,Software,51000,17
North,Dave,Services,29000,22
East,Eve,Software,67000,21
South,Frank,Hardware,33000,11
North,Grace,Services,44000,31
South,Heidi,Software,58000,19
West,Ivan,Hardware,27000,8
East,Judy,Services,72000,48
Load and analyze in Python
Create a new file in VS Code called analyze.py. Here's the full script — we'll walk through each part:
import pandas as pd
# Load the data
df = pd.read_csv("sales.csv")
# See the shape of the data
print(f"Loaded {len(df)} rows and {len(df.columns)} columns")
print(df.head()) # First 5 rows
# Total revenue by region
by_region = df.groupby("region")["revenue"].sum().sort_values(ascending=False)
print("\n--- Revenue by Region ---")
print(by_region)
# Top performers (revenue > $50K)
top_reps = df[df["revenue"] > 50000][["rep", "region", "revenue"]]
print("\n--- Top Performers ---")
print(top_reps.to_string(index=False))
# Summary stats
print(f"\nTotal revenue: ${df['revenue'].sum():,}")
print(f"Average deal: ${df['revenue'].mean():,.0f}")
print(f"Best region: {by_region.index[0]}")
Run it from your terminal:
python analyze.py
What just happened: You loaded a dataset, grouped it by region, filtered for top performers, and calculated summary stats — in about 15 lines. The equivalent Excel workflow involves pivot tables, filters, and manual number-checking. The Python version runs in under a second and works on any size dataset.
What you just learned
import pandas as pd — Loads the pandas library. The pd is just a shorthand name everyone uses.
pd.read_csv("file.csv") — Reads any CSV file into a DataFrame. A DataFrame is like an Excel sheet you can program.
df.groupby("column")["other_column"].sum() — Group rows by a category and aggregate. Same as a pivot table, one line.
df[df["column"] > value] — Filter rows by a condition. Same as Excel's filter, but scriptable.
df["column"].sum() — Sum a column. Works on millions of rows without crashing.
Day 1 Complete
- Installed Python and pandas
- Loaded a CSV file into a pandas DataFrame
- Calculated revenue totals by region with groupby
- Filtered for top performers with a condition
- Printed formatted summary statistics
Next: Pandas Fundamentals
Day 2 covers the four operations that handle 90% of real data analysis: loading messy data, cleaning it, filtering it, and grouping it properly.
Go to Day 2