Skip to main content

04/

Regularization / Bias-Variance

2 concepts · Compare Ridge vs Lasso side by side and drag λ

Coefficient Shrinkage

Drag λ to compare how Ridge and Lasso shrink coefficients differently — side by side.

0.30
0 (no penalty)heavy penalty

Ridge (L2)

income
2.15
age
1.46
debt
-1.15
tenure
0.62
score
0.23
history
0.04

Lasso (L1)

income
2.65
age
1.75
debt
-1.35
tenure
0.65
score
0.15
history
0

Lasso: 1 coefficient driven to exactly zero — automatic feature selection

Ridge penalty (Σβ²)
8.540
Lasso penalty (Σ|β|)
6.550
Ridge active
6/6
Lasso active
5/6
Mild regularization: Ridge dampens all coefficients proportionally — all 6 remain active. Lasso is beginning to zero out the weakest predictors while keeping strong ones nearly intact. This is the sweet spot zone — variance drops faster than bias increases.

Bias-Variance Tradeoff

As λ increases, bias rises and variance falls. The test error curve reveals the optimal balance.

Train errorTest errorBiasVariance
0.30
0 (no penalty)heavy penalty
Train error
0.337
Test error
0.853
Gap (variance)
0.515
Active features
6/6
Bias²
0.337
Variance
0.515
Sweet spot: Bias = 0.337 (rising), Variance = 0.515 (falling). Variance reduction outweighs the bias increase. Test error = bias + variance is near its minimum.