Course OverviewAll CoursesBlog Reserve Bootcamp Seat
Pandas Masterclass · Day 4 of 5 ~70 minutes

Time Series and Advanced Indexing

Parse dates, resample by period, calculate rolling stats, and work with DatetimeIndex for financial and operational data.

1
Day 1
2
Day 2
3
Day 3
4
Day 4
5
Day 5
What You'll Build Today

A time series analysis script: parse a date column, resample daily data to monthly, compute 7-day and 30-day rolling averages, calculate month-over-month growth, and detect anomalies using rolling standard deviation.

1
Dates

Parsing and working with dates

python
import pandas as pd

# Parse dates on load
df = pd.read_csv("sales.csv", parse_dates=["order_date"])

# Or convert after load
df["order_date"] = pd.to_datetime(df["order_date"])

# Handle non-standard formats
df["order_date"] = pd.to_datetime(df["order_date"], format="%d/%m/%Y")

# Extract date parts
df["year"]  = df["order_date"].dt.year
df["month"] = df["order_date"].dt.month
df["dow"]   = df["order_date"].dt.day_name()

# Days since a reference date
df["days_since_launch"] = (
    df["order_date"] - pd.Timestamp("2024-01-01")
).dt.days
2
DatetimeIndex

Setting a DatetimeIndex

Setting the date column as the index unlocks powerful time-based slicing and resampling.

python
# Set the date as index
df = df.set_index("order_date").sort_index()

# Slice by date string — very readable
q1 = df["2024-01":"2024-03"]
jan = df["2024-01"]
after_june = df["2024-06":]
3
Resample

Resampling — change time frequency

resample() aggregates data to a different time period. Daily → monthly. Hourly → daily.

python
# Resample daily data to monthly totals
monthly = df["revenue"].resample("ME").sum()

# Weekly average
weekly_avg = df["revenue"].resample("W").mean()

# Quarterly stats
quarterly = df["revenue"].resample("QE").agg(["sum", "mean", "count"])

# Common frequency aliases:
# "D" = day, "W" = week, "ME" = month-end
# "QE" = quarter-end, "YE" = year-end, "h" = hour
4
Rolling

Rolling averages and statistics

pythonrolling.py
# 7-day rolling average
df["revenue_7d"] = df["revenue"].rolling(7).mean()

# 30-day rolling sum
df["revenue_30d"] = df["revenue"].rolling(30).sum()

# Rolling standard deviation — for anomaly detection
df["rolling_std"] = df["revenue"].rolling(30).std()

# Flag values more than 2 std deviations from rolling mean
rolling_mean = df["revenue"].rolling(30).mean()
df["anomaly"] = (df["revenue"] - rolling_mean).abs() > (2 * df["rolling_std"])

# Month-over-month growth
monthly = df["revenue"].resample("ME").sum()
monthly_pct = monthly.pct_change() * 100
print(monthly_pct.round(1))

Real use case: Rolling averages smooth out noise. If your daily revenue swings wildly (weekends vs. weekdays), a 7-day rolling average shows you the real trend. Every operations dashboard uses this.

Practice Challenge

Analyze a time series dataset

  • Find any time series CSV (stock prices, website traffic, sales data)
  • Parse the date column and set it as the index
  • Resample to monthly totals and plot with .plot()
  • Add a 30-day rolling average column
  • Calculate month-over-month percentage change

What You Learned Today

  • Parse date strings with to_datetime() and parse_dates parameter
  • Extract year, month, day, day_of_week with .dt accessor
  • Set DatetimeIndex for clean date-based slicing
  • Resample daily data to weekly, monthly, quarterly
  • Calculate rolling averages, sums, and std for trend analysis
  • Detect anomalies with rolling mean + std deviation
Course Progress
Day 4 of 5 — 80%
Day 4 Complete

Day 5: Performance

The final day: vectorization for 100x speedups, chunking for datasets larger than RAM, and profiling slow code.

Start Day 5
Finished this lesson?