See how the statistics you've learned powers machine learning: bias-variance tradeoff, cross-validation, evaluation metrics, and model selection.
Machine learning didn't replace statistics — it extended it. Every concept from this week appears in how ML models are trained, evaluated, and deployed. This final lesson connects the dots.
Bias is systematic error — your model makes the same kind of mistake consistently (underfitting). Variance is sensitivity to training data — your model memorizes noise and fails on new data (overfitting). The goal is the minimum total error.
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.model_selection import cross_val_score
np.random.seed(42)
X = np.linspace(0, 1, 100).reshape(-1, 1)
y = np.sin(2 * np.pi * X).ravel() + np.random.normal(0, 0.3, 100)
for degree in [1, 3, 10, 20]:
model = Pipeline([
('poly', PolynomialFeatures(degree)),
('ridge', Ridge(alpha=0.001))
])
scores = cross_val_score(model, X, y, cv=5, scoring='neg_mean_squared_error')
cv_mse = -scores.mean()
print(f"Degree {degree:2d}: CV MSE = {cv_mse:.4f}")Never evaluate a model on the data it was trained on. K-fold cross-validation splits data into K groups, trains on K-1, tests on 1, and rotates — giving you K independent test scores.
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, cross_validate
# Stratified K-fold preserves class balance in each fold
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
model = RandomForestClassifier(n_estimators=100, random_state=42)
results = cross_validate(
model, X, y, cv=cv,
scoring=['accuracy', 'roc_auc', 'f1'],
return_train_score=True
)
print("CV Accuracy: {:.3f} ± {:.3f}".format(
results['test_accuracy'].mean(),
results['test_accuracy'].std()
))
print("Train-Test gap: {:.3f}".format(
results['train_accuracy'].mean() - results['test_accuracy'].mean()
)) # Large gap = overfittingAccuracy: Good when classes are balanced. Misleading with imbalanced data (99% "not fraud" model has 99% accuracy but catches no fraud).
Precision: Of all positives predicted, how many are actually positive? Use when false positives are costly (spam filter).
Recall: Of all actual positives, how many did we find? Use when false negatives are costly (cancer detection).
AUC-ROC: Overall classifier quality across all thresholds. Best single metric for imbalanced binary classification.
Our in-person AI bootcamp covers advanced AI development, agentic systems, and production deployment. Five cities. $1,490.
Reserve Your Seat →