04/

Regularization / Bias-Variance

2 concepts · Compare Ridge vs Lasso side by side and drag λ

Coefficient Shrinkage

Drag λ to compare how Ridge and Lasso shrink coefficients differently — side by side.

λ0.30

0 (no penalty)heavy penalty

Ridge (L2)

income

2.15

age

1.46

debt

-1.15

tenure

0.62

score

0.23

history

0.04

Lasso (L1)

income

2.65

age

1.75

debt

-1.35

tenure

0.65

score

0.15

history

Lasso: 1 coefficient driven to exactly zero — automatic feature selection

Ridge penalty (Σβ²)

8.540

Lasso penalty (Σ|β|)

6.550

Ridge active

6/6

Lasso active

5/6

Mild regularization: Ridge dampens all coefficients proportionally — all 6 remain active. Lasso is beginning to zero out the weakest predictors while keeping strong ones nearly intact. This is the sweet spot zone — variance drops faster than bias increases.

Bias-Variance Tradeoff

As λ increases, bias rises and variance falls. The test error curve reveals the optimal balance.

Train errorTest errorBiasVariance

λ0.30

0 (no penalty)heavy penalty

Train error

0.337

Test error

0.853

Gap (variance)

0.515

Active features

6/6

Bias²

0.337

Variance

0.515

Sweet spot: Bias = 0.337 (rising), Variance = 0.515 (falling). Variance reduction outweighs the bias increase. Test error = bias + variance is near its minimum.