A time series analysis script: parse a date column, resample daily data to monthly, compute 7-day and 30-day rolling averages, calculate month-over-month growth, and detect anomalies using rolling standard deviation.
Parsing and working with dates
import pandas as pd
# Parse dates on load
df = pd.read_csv("sales.csv", parse_dates=["order_date"])
# Or convert after load
df["order_date"] = pd.to_datetime(df["order_date"])
# Handle non-standard formats
df["order_date"] = pd.to_datetime(df["order_date"], format="%d/%m/%Y")
# Extract date parts
df["year"] = df["order_date"].dt.year
df["month"] = df["order_date"].dt.month
df["dow"] = df["order_date"].dt.day_name()
# Days since a reference date
df["days_since_launch"] = (
df["order_date"] - pd.Timestamp("2024-01-01")
).dt.days
Setting a DatetimeIndex
Setting the date column as the index unlocks powerful time-based slicing and resampling.
# Set the date as index
df = df.set_index("order_date").sort_index()
# Slice by date string — very readable
q1 = df["2024-01":"2024-03"]
jan = df["2024-01"]
after_june = df["2024-06":]
Resampling — change time frequency
resample() aggregates data to a different time period. Daily → monthly. Hourly → daily.
# Resample daily data to monthly totals
monthly = df["revenue"].resample("ME").sum()
# Weekly average
weekly_avg = df["revenue"].resample("W").mean()
# Quarterly stats
quarterly = df["revenue"].resample("QE").agg(["sum", "mean", "count"])
# Common frequency aliases:
# "D" = day, "W" = week, "ME" = month-end
# "QE" = quarter-end, "YE" = year-end, "h" = hour
Rolling averages and statistics
# 7-day rolling average
df["revenue_7d"] = df["revenue"].rolling(7).mean()
# 30-day rolling sum
df["revenue_30d"] = df["revenue"].rolling(30).sum()
# Rolling standard deviation — for anomaly detection
df["rolling_std"] = df["revenue"].rolling(30).std()
# Flag values more than 2 std deviations from rolling mean
rolling_mean = df["revenue"].rolling(30).mean()
df["anomaly"] = (df["revenue"] - rolling_mean).abs() > (2 * df["rolling_std"])
# Month-over-month growth
monthly = df["revenue"].resample("ME").sum()
monthly_pct = monthly.pct_change() * 100
print(monthly_pct.round(1))
Real use case: Rolling averages smooth out noise. If your daily revenue swings wildly (weekends vs. weekdays), a 7-day rolling average shows you the real trend. Every operations dashboard uses this.
Analyze a time series dataset
- Find any time series CSV (stock prices, website traffic, sales data)
- Parse the date column and set it as the index
- Resample to monthly totals and plot with
.plot() - Add a 30-day rolling average column
- Calculate month-over-month percentage change
What You Learned Today
- Parse date strings with to_datetime() and parse_dates parameter
- Extract year, month, day, day_of_week with .dt accessor
- Set DatetimeIndex for clean date-based slicing
- Resample daily data to weekly, monthly, quarterly
- Calculate rolling averages, sums, and std for trend analysis
- Detect anomalies with rolling mean + std deviation
Day 5: Performance
The final day: vectorization for 100x speedups, chunking for datasets larger than RAM, and profiling slow code.
Start Day 5