A model comparison report showing accuracy, precision, recall, and training time for 5 algorithms, plus a tuned best model found via GridSearchCV — with a fully reproducible experiment script.
Compare Multiple Algorithms
Never just try one model. The best algorithm depends on your data. Spend 10 minutes comparing 5 and you'll usually find one that's clearly better.
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
import time
models = {
'Logistic Regression': LogisticRegression(max_iter=1000),
'Decision Tree': DecisionTreeClassifier(random_state=42),
'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
'Gradient Boosting': GradientBoostingClassifier(random_state=42),
'SVM': SVC(probability=True)
}
results = []
for name, model in models.items():
start = time.time()
scores = cross_val_score(model, X_train, y_train, cv=5, scoring='f1')
elapsed = time.time() - start
results.append({
'Model': name,
'CV F1 Mean': scores.mean(),
'CV F1 Std': scores.std(),
'Train Time (s)': round(elapsed, 2)
})
results_df = pd.DataFrame(results).sort_values('CV F1 Mean', ascending=False)
print(results_df.to_string())GridSearchCV and RandomizedSearchCV
Hyperparameters are settings you configure before training (like n_estimators in Random Forest). GridSearch tries every combination. RandomizedSearch samples from distributions — faster for large spaces.
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
# Grid search (exhaustive — try every combination)
param_grid = {
'n_estimators': [50, 100, 200],
'max_depth': [None, 5, 10],
'min_samples_split': [2, 5],
}
# 3 x 3 x 2 = 18 combinations x 5 folds = 90 fits
gs = GridSearchCV(
RandomForestClassifier(random_state=42),
param_grid,
cv=5,
scoring='f1',
n_jobs=-1, # use all CPU cores
verbose=1
)
gs.fit(X_train, y_train)
print(f"Best params: {gs.best_params_}")
print(f"Best CV F1: {gs.best_score_:.3f}")
# Best model is already fitted
best_model = gs.best_estimator_
print(f"Test F1: {best_model.score(X_test, y_test):.3f}")Cross-Validation Deep Dive
Cross-validation gives you a much more reliable accuracy estimate than a single train/test split. K-fold splits data into K parts, trains on K-1, tests on 1, and rotates.
from sklearn.model_selection import StratifiedKFold, cross_validate
# Stratified: preserves class ratio in each fold
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
# Multiple metrics at once
results = cross_validate(
best_model, X_train, y_train,
cv=cv,
scoring=['accuracy', 'precision', 'recall', 'f1'],
return_train_score=True
)
for metric in ['test_accuracy', 'test_precision', 'test_recall', 'test_f1']:
scores = results[metric]
print(f"{metric}: {scores.mean():.3f} +/- {scores.std():.3f}")
# High train score + low test score = overfitting
print(f"Train acc: {results['train_accuracy'].mean():.3f}")Overfit check: If train accuracy is 0.99 and test accuracy is 0.80, your model memorized the training data instead of learning patterns. Reduce model complexity or add more data.
When to Use Which Model
Choosing the right algorithm saves hours. Here's a quick reference.
What You Learned Today
- Compared 5 ML algorithms using 5-fold cross-validation on the same dataset
- Used GridSearchCV to exhaustively tune hyperparameters and find the best combination
- Applied StratifiedKFold to ensure class balance across folds
- Identified overfitting by comparing train vs test scores
Go Further on Your Own
- Try XGBoost: pip install xgboost, from xgboost import XGBClassifier — often beats sklearn's GradientBoosting
- Add early stopping to your grid search with refit=True and use the test set as a final holdout
- Visualize the cross-validation results as a box plot showing score distribution across folds
Nice work. Keep going.
Day 4 is ready when you are.
Continue to Day 4Want live instruction and hands-on projects? Join the AI bootcamp — 3 days, 5 cities.