Day 1: Why analysts are making the switch

Today's Objective

A sales data loader that reads a CSV, calculates total and average revenue by region, filters for top performers, and prints a clean summary — all in about 15 lines of Python. We'll compare this to doing the same thing in Excel.

A sales data loader that reads a CSV, calculates total and average revenue by region, filters for top performers, and prints a clean summary — all in about 15 lines of Python. We'll compare this to doing the same thing in Excel.

Why analysts are making the switch

Excel is great. It's visual, fast to start, and everyone knows it. But it has three problems that Python solves:

Problem 1: Scale. Excel slows to a crawl above 100K rows. Python handles millions of rows in seconds.

Problem 2: Reproducibility. When your boss asks "how did you get this number?" in Excel, you trace 14 formulas. In Python, you show them the code.

Problem 3: Automation. Excel requires a human every time. Python can run the same analysis on every new dataset without you.

This course is not about replacing your Excel skills. It's about adding a new layer on top of them. Your domain knowledge of what the data means is irreplaceable — Python just removes the grunt work.

Install Python and pandas

Step 1: Install Python

Go to python.org/downloads and download Python 3.11 or later. Run the installer. On Windows, check the box that says "Add Python to PATH" before clicking Install.

Verify it worked by opening your terminal (Command Prompt on Windows, Terminal on Mac) and typing:

bash.txt

BASH

python --version

You should see something like Python 3.11.4. If it says "command not found," restart your terminal and try again.

Step 2: Install pandas

In your terminal, run:

bash.txt

BASH

pip install pandas

This installs pandas — the Python library that works like Excel but faster and scriptable. You only do this once.

Step 3: Get a code editor

Download VS Code from code.visualstudio.com — free. Install the Python extension when prompted. This is where you'll write your scripts for the rest of the course.

Create a sample dataset

We'll create a simple sales CSV to work with. Create a new folder called data-analysis-course, open it in VS Code, and create a file called sales.csv with this content:

sales.csv

CSV

region,rep,product,revenue,units
West,Alice,Software,42000,14
East,Bob,Hardware,38000,9
West,Carol,Software,51000,17
North,Dave,Services,29000,22
East,Eve,Software,67000,21
South,Frank,Hardware,33000,11
North,Grace,Services,44000,31
South,Heidi,Software,58000,19
West,Ivan,Hardware,27000,8
East,Judy,Services,72000,48

Load and analyze in Python

Create a new file in VS Code called analyze.py. Here's the full script — we'll walk through each part:

analyze.py

PYTHON

import pandas as pd

# Load the data
df = pd.read_csv("sales.csv")

# See the shape of the data
print(f"Loaded {len(df)} rows and {len(df.columns)} columns")
print(df.head())  # First 5 rows

# Total revenue by region
by_region = df.groupby("region")["revenue"].sum().sort_values(ascending=False)
print("\n--- Revenue by Region ---")
print(by_region)

# Top performers (revenue > $50K)
top_reps = df[df["revenue"] > 50000][["rep", "region", "revenue"]]
print("\n--- Top Performers ---")
print(top_reps.to_string(index=False))

# Summary stats
print(f"\nTotal revenue: ${df['revenue'].sum():,}")
print(f"Average deal: ${df['revenue'].mean():,.0f}")
print(f"Best region: {by_region.index[0]}")

Run it from your terminal:

bash.txt

BASH

python analyze.py

What just happened: You loaded a dataset, grouped it by region, filtered for top performers, and calculated summary stats — in about 15 lines. The equivalent Excel workflow involves pivot tables, filters, and manual number-checking. The Python version runs in under a second and works on any size dataset.

What you just learned

import pandas as pd — Loads the pandas library. The pd is just a shorthand name everyone uses.

pd.read_csv("file.csv") — Reads any CSV file into a DataFrame. A DataFrame is like an Excel sheet you can program.

df.groupby("column")["other_column"].sum() — Group rows by a category and aggregate. Same as a pivot table, one line.

df[df["column"] > value] — Filter rows by a condition. Same as Excel's filter, but scriptable.

df["column"].sum() — Sum a column. Works on millions of rows without crashing.

tById('navLinks').classList.remove('open')); }); (function(){ const bar = document.getElementById('readingProgress'); const body = document.querySelector('.lesson-body'); function update(){ if (!body) return; const rect = body.getBoundingClientRect(); const total = rect.height - window.innerHeight + rect.top; const scrolled = Math.min(Math.max(-rect.top, 0), total); const pct = total > 0 ? (scrolled / total) * 100 : 0; bar.style.width = pct + '%'; } window.addEventListener('scroll', update, { passive: true }); window.addEventListener('resize', update); update(); })();

What's Next

The foundations from today carry directly into Day 2. In the next session the focus shifts to Day 2 — building directly on everything covered here.

Supporting Videos & Reading

Go deeper with these external references.

YouTube

Day 1 — Video Walkthroughs Community tutorials and walkthroughs covering the concepts in this lesson.

→

YouTube

Day 1 Explained Deep-dive explanations and live-coding sessions from top educators.

→

Official Docs

Official Documentation Primary reference documentation for the technologies covered in this lesson.

→

GitHub

Open Source Examples Real-world codebases demonstrating the patterns taught in this lesson.

→

Day 1 Checkpoint

Before moving on, verify you can answer these without looking:

What is the core concept introduced in this lesson, and why does it matter?
What are the two or three most common mistakes practitioners make with this topic?
Can you explain the key code pattern from this lesson to a colleague in plain language?
What would break first if you skipped the safeguards or best practices described here?
How does today's topic connect to what comes in Day 2?

Live Bootcamp

Learn this in person — 2 days, 5 cities

Thu–Fri sessions in Denver, Los Angeles, New York, Chicago, and Dallas. $1,490 per seat. June–October 2026.

Reserve Your Seat →

Continue To Day 2

Day 2

→