Course Overview All Courses Blog Reserve Bootcamp Seat
Data Analysis · Day 1 of 5 ~60 minutes

Spreadsheets to Python — Why Analysts Are Switching

Install Python, load your first real dataset, and see what's possible when you stop fighting with Excel.

1
Day 1
2
Day 2
3
Day 3
4
Day 4
5
Day 5
What You'll Build Today

A sales data loader that reads a CSV, calculates total and average revenue by region, filters for top performers, and prints a clean summary — all in about 15 lines of Python. We'll compare this to doing the same thing in Excel.

1
The Case for Python

Why analysts are making the switch

Excel is great. It's visual, fast to start, and everyone knows it. But it has three problems that Python solves:

Problem 1: Scale. Excel slows to a crawl above 100K rows. Python handles millions of rows in seconds.

Problem 2: Reproducibility. When your boss asks "how did you get this number?" in Excel, you trace 14 formulas. In Python, you show them the code.

Problem 3: Automation. Excel requires a human every time. Python can run the same analysis on every new dataset without you.

This course is not about replacing your Excel skills. It's about adding a new layer on top of them. Your domain knowledge of what the data means is irreplaceable — Python just removes the grunt work.

2
Setup

Install Python and pandas

Step 1: Install Python

Go to python.org/downloads and download Python 3.11 or later. Run the installer. On Windows, check the box that says "Add Python to PATH" before clicking Install.

Verify it worked by opening your terminal (Command Prompt on Windows, Terminal on Mac) and typing:

bash
python --version

You should see something like Python 3.11.4. If it says "command not found," restart your terminal and try again.

Step 2: Install pandas

In your terminal, run:

bash
pip install pandas

This installs pandas — the Python library that works like Excel but faster and scriptable. You only do this once.

Step 3: Get a code editor

Download VS Code from code.visualstudio.com — free. Install the Python extension when prompted. This is where you'll write your scripts for the rest of the course.

3
Your First Dataset

Create a sample dataset

We'll create a simple sales CSV to work with. Create a new folder called data-analysis-course, open it in VS Code, and create a file called sales.csv with this content:

csvsales.csv
region,rep,product,revenue,units
West,Alice,Software,42000,14
East,Bob,Hardware,38000,9
West,Carol,Software,51000,17
North,Dave,Services,29000,22
East,Eve,Software,67000,21
South,Frank,Hardware,33000,11
North,Grace,Services,44000,31
South,Heidi,Software,58000,19
West,Ivan,Hardware,27000,8
East,Judy,Services,72000,48
4
Your First Script

Load and analyze in Python

Create a new file in VS Code called analyze.py. Here's the full script — we'll walk through each part:

pythonanalyze.py
import pandas as pd

# Load the data
df = pd.read_csv("sales.csv")

# See the shape of the data
print(f"Loaded {len(df)} rows and {len(df.columns)} columns")
print(df.head())  # First 5 rows

# Total revenue by region
by_region = df.groupby("region")["revenue"].sum().sort_values(ascending=False)
print("\n--- Revenue by Region ---")
print(by_region)

# Top performers (revenue > $50K)
top_reps = df[df["revenue"] > 50000][["rep", "region", "revenue"]]
print("\n--- Top Performers ---")
print(top_reps.to_string(index=False))

# Summary stats
print(f"\nTotal revenue: ${df['revenue'].sum():,}")
print(f"Average deal: ${df['revenue'].mean():,.0f}")
print(f"Best region: {by_region.index[0]}")

Run it from your terminal:

bash
python analyze.py

What just happened: You loaded a dataset, grouped it by region, filtered for top performers, and calculated summary stats — in about 15 lines. The equivalent Excel workflow involves pivot tables, filters, and manual number-checking. The Python version runs in under a second and works on any size dataset.

5
Key Concepts

What you just learned

import pandas as pd — Loads the pandas library. The pd is just a shorthand name everyone uses.

pd.read_csv("file.csv") — Reads any CSV file into a DataFrame. A DataFrame is like an Excel sheet you can program.

df.groupby("column")["other_column"].sum() — Group rows by a category and aggregate. Same as a pivot table, one line.

df[df["column"] > value] — Filter rows by a condition. Same as Excel's filter, but scriptable.

df["column"].sum() — Sum a column. Works on millions of rows without crashing.

Day 1 Complete

  • Installed Python and pandas
  • Loaded a CSV file into a pandas DataFrame
  • Calculated revenue totals by region with groupby
  • Filtered for top performers with a condition
  • Printed formatted summary statistics
Day 1 Done

Next: Pandas Fundamentals

Day 2 covers the four operations that handle 90% of real data analysis: loading messy data, cleaning it, filtering it, and grouping it properly.

Go to Day 2
Finished this lesson?