Time Series Analysis: Forecasting with Python

Key Takeaways

Decompose before forecasting: Split your time series into trend, seasonality, and residuals before choosing a model. Understanding which components are present determines the best approach.
Prophet is the best starting point: Facebook's Prophet handles seasonality, holidays, and trend changepoints automatically. It is the most productive starting tool for business forecasting.
Never use random splits for time series: Always split chronologically — earlier dates for training, later for testing. Random splits allow future data to leak into training, producing inflated performance estimates.
ML with lagged features often beats ARIMA: For complex time series with multiple drivers, XGBoost with lagged features and date features often outperforms ARIMA. Compare both on a proper validation set.

Time series forecasting is one of the most valuable skills in data science because almost every business decision involves predicting something that changes over time. This guide covers decomposition, Prophet, ARIMA, and ML-based forecasting with Python.

Decomposition

import pandas as pd
from statsmodels.tsa.seasonal import seasonal_decompose

df = pd.read_csv('sales.csv', parse_dates=['date'], index_col='date')
result = seasonal_decompose(df['revenue'], model='additive', period=52)
result.plot()

Trend: Long-run direction (growing, declining, flat).
Seasonality: Repeating patterns (weekly, monthly, annual).
Residuals: Remaining random noise after removing trend and seasonality.

Prophet Forecasting

from prophet import Prophet
import pandas as pd

# Prophet requires columns named 'ds' (date) and 'y' (value)
df = df.rename(columns={'date': 'ds', 'revenue': 'y'})

model = Prophet(
    yearly_seasonality=True,
    weekly_seasonality=True,
    changepoint_prior_scale=0.05
)
model.add_country_holidays(country_name='US')
model.fit(df)

future = model.make_future_dataframe(periods=90)
forecast = model.predict(future)
fig = model.plot(forecast)

ML Approach: XGBoost with Lagged Features

import pandas as pd
from xgboost import XGBRegressor
from sklearn.metrics import mean_absolute_error

# Create lagged features
df['lag_7'] = df['revenue'].shift(7)
df['lag_28'] = df['revenue'].shift(28)
df['rolling_7_mean'] = df['revenue'].rolling(7).mean()
df['day_of_week'] = df.index.dayofweek
df['month'] = df.index.month

# Chronological train/test split (NEVER random)
split = '2025-10-01'
train = df[df.index < split].dropna()
test = df[df.index >= split].dropna()

features = ['lag_7', 'lag_28', 'rolling_7_mean', 'day_of_week', 'month']
model = XGBRegressor(n_estimators=300, random_state=42)
model.fit(train[features], train['revenue'])
preds = model.predict(test[features])
print(f"MAE: {mean_absolute_error(test['revenue'], preds):.2f}")

Evaluating Forecasts

Always compare against a naive baseline (last value, same period last year). If your model cannot beat the naive forecast, it provides no value.

Metrics: MAE (mean absolute error, interpretable in original units), MAPE (mean absolute percentage error, easy to communicate), RMSE (penalizes large errors more than MAE).

Use TimeSeriesSplit from scikit-learn for cross-validation — it creates folds that preserve chronological order. Never use regular KFold on time series.

Frequently Asked Questions

What is stationarity in time series?

A stationary time series has constant mean and variance over time. Many classical forecasting models (ARIMA) require stationarity. Test with the Augmented Dickey-Fuller test (from statsmodels). If non-stationary, apply differencing (subtract each value from the previous) until the ADF test indicates stationarity.

When should I use Prophet vs ARIMA?

Use Prophet for business time series with multiple seasonalities, holiday effects, and trend changepoints. It is easy to use and interprets well. Use ARIMA for simpler time series where the statistical model needs to be transparent and well-specified. In practice, try both and compare on a validation set.

How do I avoid data leakage in time series?

Never use random train/test splits. Always split chronologically — earlier dates for training, later dates for testing. Use TimeSeriesSplit from scikit-learn for cross-validation. Ensure lagged features use only past values, not any information that would not be available at prediction time.

What is the naive forecast and why does it matter?

A naive forecast predicts the next value as equal to the last observed value (or the same period last year for seasonal data). It is the simplest possible baseline. Your model must outperform the naive forecast to be useful. A model that cannot beat last week's sales as a prediction for this week provides no value.

Note: Information reflects early 2026.

Bo Peng

AI Instructor & Founder, Precision AI Academy

Bo has trained 400+ professionals in applied AI across federal agencies and Fortune 500 companies.

Time Series Analysis: Forecasting with Python

Key Takeaways

Decomposition

Prophet Forecasting

ML Approach: XGBoost with Lagged Features

Evaluating Forecasts

Frequently Asked Questions

What is stationarity in time series?

When should I use Prophet vs ARIMA?

How do I avoid data leakage in time series?

What is the naive forecast and why does it matter?

Bo Peng

Build Real Skills. In Person. This October.

LLMs are surprisingly good at time series — but not in the way you'd expect.

Published By

Precision AI Academy

Time Series Analysis: Forecasting with Python

Key Takeaways

Decomposition

Prophet Forecasting

ML Approach: XGBoost with Lagged Features

Evaluating Forecasts

Frequently Asked Questions

What is stationarity in time series?

When should I use Prophet vs ARIMA?

How do I avoid data leakage in time series?

What is the naive forecast and why does it matter?

Bo Peng

Build Real Skills. In Person. This October.

LLMs are surprisingly good at time series — but not in the way you'd expect.

Published By

Precision AI Academy

Keep Reading

The Complete AI Guide for Beginners

How to Build an AI Agent in 2026

Best AI Bootcamps of 2026