Day 5 of 5
⏱ ~60 minutes
Statistics for Data Science — Day 5

Statistics in Machine Learning — Connecting Theory to Practice

See how the statistics you've learned powers machine learning: bias-variance tradeoff, cross-validation, evaluation metrics, and model selection.

Statistics in Machine Learning

Machine learning didn't replace statistics — it extended it. Every concept from this week appears in how ML models are trained, evaluated, and deployed. This final lesson connects the dots.

The Bias-Variance Tradeoff

Bias is systematic error — your model makes the same kind of mistake consistently (underfitting). Variance is sensitivity to training data — your model memorizes noise and fails on new data (overfitting). The goal is the minimum total error.

Python — Demonstrating bias-variance tradeoff
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.model_selection import cross_val_score

np.random.seed(42)
X = np.linspace(0, 1, 100).reshape(-1, 1)
y = np.sin(2 * np.pi * X).ravel() + np.random.normal(0, 0.3, 100)

for degree in [1, 3, 10, 20]:
    model = Pipeline([
        ('poly', PolynomialFeatures(degree)),
        ('ridge', Ridge(alpha=0.001))
    ])
    scores = cross_val_score(model, X, y, cv=5, scoring='neg_mean_squared_error')
    cv_mse = -scores.mean()
    print(f"Degree {degree:2d}: CV MSE = {cv_mse:.4f}")

Cross-Validation: Honest Model Evaluation

Never evaluate a model on the data it was trained on. K-fold cross-validation splits data into K groups, trains on K-1, tests on 1, and rotates — giving you K independent test scores.

Python — Proper model evaluation
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, cross_validate

# Stratified K-fold preserves class balance in each fold
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
model = RandomForestClassifier(n_estimators=100, random_state=42)

results = cross_validate(
    model, X, y, cv=cv,
    scoring=['accuracy', 'roc_auc', 'f1'],
    return_train_score=True
)

print("CV Accuracy:  {:.3f} ± {:.3f}".format(
    results['test_accuracy'].mean(),
    results['test_accuracy'].std()
))
print("Train-Test gap: {:.3f}".format(
    results['train_accuracy'].mean() - results['test_accuracy'].mean()
))  # Large gap = overfitting

Evaluation Metrics and When to Use Them

Accuracy: Good when classes are balanced. Misleading with imbalanced data (99% "not fraud" model has 99% accuracy but catches no fraud).

Precision: Of all positives predicted, how many are actually positive? Use when false positives are costly (spam filter).

Recall: Of all actual positives, how many did we find? Use when false negatives are costly (cancer detection).

AUC-ROC: Overall classifier quality across all thresholds. Best single metric for imbalanced binary classification.

💡
Use F1 score when you need to balance precision and recall. F1 = 2 × (precision × recall) / (precision + recall). It's the harmonic mean — it punishes extreme imbalance between the two.
Day 5 Capstone Exercise
Evaluate a Model the Right Way
  1. Train a classification model on an imbalanced dataset (fraud, churn)
  2. Report accuracy — notice how misleadingly high it is
  3. Calculate precision, recall, F1, and AUC-ROC
  4. Run 5-fold cross-validation and report mean ± std for each metric
  5. Check the train vs test gap — is the model overfitting?

Day 5 Summary — Statistics for Data Science Course Complete

  • Bias-variance tradeoff: underfitting vs overfitting — cross-validation finds the balance
  • K-fold cross-validation gives honest estimates of out-of-sample performance
  • Accuracy misleads on imbalanced data — use AUC-ROC, F1, precision/recall
  • Train-test gap measures overfitting: large gap means your model memorized noise
  • Statistics is not separate from ML — it's the foundation that makes ML results trustworthy

Want to go deeper in 3 days?

Our in-person AI bootcamp covers advanced AI development, agentic systems, and production deployment. Five cities. $1,490.

Reserve Your Seat →
Finished this lesson?