Day 03 Applied Practice

Model Selection and Hyperparameter Tuning

Compare multiple ML algorithms, tune their hyperparameters with cross-validation, and pick the best model with statistical confidence.

~1 hour Hands-on Precision AI Academy

Today’s Objective

Compare multiple ML algorithms, tune their hyperparameters with cross-validation, and pick the best model with statistical confidence.

A model comparison report showing accuracy, precision, recall, and training time for 5 algorithms, plus a tuned best model found via GridSearchCV — with a fully reproducible experiment script.

Compare Multiple Algorithms

Never just try one model. The best algorithm depends on your data. Spend 10 minutes comparing 5 and you'll usually find one that's clearly better.

compare_models.py
PYTHON
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
import time

models = { 'Logistic Regression': LogisticRegression(max_iter=1000), 'Decision Tree': DecisionTreeClassifier(random_state=42), 'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42), 'Gradient Boosting': GradientBoostingClassifier(random_state=42), 'SVM': SVC(probability=True)
}

results = []
for name, model in models.items(): start = time.time() scores = cross_val_score(model, X_train, y_train, cv=5, scoring='f1') elapsed = time.time() - start results.append({ 'Model': name, 'CV F1 Mean': scores.mean(), 'CV F1 Std': scores.std(), 'Train Time (s)': round(elapsed, 2) })

results_df = pd.DataFrame(results).sort_values('CV F1 Mean', ascending=False)
print(results_df.to_string())

GridSearchCV and RandomizedSearchCV

Hyperparameters are settings you configure before training (like n_estimators in Random Forest). GridSearch tries every combination. RandomizedSearch samples from distributions — faster for large spaces.

tune.py
PYTHON
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

# Grid search (exhaustive — try every combination)
param_grid = { 'n_estimators': [50, 100, 200], 'max_depth': [None, 5, 10], 'min_samples_split': [2, 5],
}
# 3 x 3 x 2 = 18 combinations x 5 folds = 90 fits

gs = GridSearchCV( RandomForestClassifier(random_state=42), param_grid, cv=5, scoring='f1', n_jobs=-1, # use all CPU cores verbose=1
)
gs.fit(X_train, y_train)

print(f"Best params: {gs.best_params_}")
print(f"Best CV F1: {gs.best_score_:.3f}")

# Best model is already fitted
best_model = gs.best_estimator_
print(f"Test F1: {best_model.score(X_test, y_test):.3f}")

Cross-Validation Deep Dive

Cross-validation gives you a much more reliable accuracy estimate than a single train/test split. K-fold splits data into K parts, trains on K-1, tests on 1, and rotates.

cv.py
PYTHON
from sklearn.model_selection import StratifiedKFold, cross_validate

# Stratified: preserves class ratio in each fold
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Multiple metrics at once
results = cross_validate( best_model, X_train, y_train, cv=cv, scoring=['accuracy', 'precision', 'recall', 'f1'], return_train_score=True
)

for metric in ['test_accuracy', 'test_precision', 'test_recall', 'test_f1']: scores = results[metric] print(f"{metric}: {scores.mean():.3f} +/- {scores.std():.3f}")

# High train score + low test score = overfitting
print(f"Train acc: {results['train_accuracy'].mean():.3f}")

Overfit check: If train accuracy is 0.99 and test accuracy is 0.80, your model memorized the training data instead of learning patterns. Reduce model complexity or add more data.

When to Use Which Model

Choosing the right algorithm saves hours. Here's a quick reference.

Algorithm Selection Guide
Logistic Regression
Start here. Fast, interpretable, good baseline. Works well with many features.
Random Forest
Usually best out-of-box. Handles mixed types, robust to outliers, shows feature importance.
Gradient Boosting
Often best accuracy. Slower to train. Use XGBoost/LightGBM for production.
SVM
Good for high-dimensional text data. Slow on large datasets. Hard to tune.
Decision Tree
Interpretable and explainable. Overfits easily alone — use as part of an ensemble.
Neural Network
Best for images, text, audio. Needs large data. Overkill for tabular data.
60%

Supporting Resources

Go deeper with these references.

scikit-learn
scikit-learn Documentation Complete API reference and user guide for scikit-learn estimators.
Kaggle
Kaggle Learn: ML Course Free hands-on ML course with Jupyter notebooks and datasets.
YouTube
StatQuest with Josh Starmer Clear visual explanations of ML algorithms — widely considered the best free resource.

Day 3 Checkpoint

Before moving on, make sure you can answer these without looking:

Continue To Day 4
Model Evaluation and Interpretation