01/

Gradient Descent

4 concepts · Drag sliders and run simulations

Why Gradients Point Downhill

Drag the dot on the curve (or use the slider). The dashed yellow line is the tangent — its slope IS the gradient. Watch how the gradient arrow always points uphill, and the negative gradient points downhill.

Your position (w)7.0

Position (w)

7.0

Loss

8.50

Gradient (slope)

+4.0

−Gradient

-4.0

Gradient = +4.0 — the tangent line tilts uphill to the right (slope = +4.0). Negative gradient points LEFT. w_new = 7.0 − η×4.0 moves w left. That's downhill.

Learning Rate

Watch gradient descent step by step. Adjust η to see overshooting vs slow convergence, or compare all three at once.

Learning rate (η)0.30

Step

7.000

Loss

8.500

Gradient

+4.000

Hit “1 step” to start. Try η=0.30 first, then 0.02 (slow) and 1.00 (overshoots). Or hit “Compare All 3” to see them side by side.

Batch vs SGD vs Mini-batch

Select a variant and run it. Each starts from the same point (top-right) heading to the minimum (center). Adjust the noise slider to see the full spectrum from clean to chaotic.

Noise level0.70

Select a variant and hit “Run selected.”

GD vs Gradient Boosting

Same algorithm, different spaces. Shared controls advance both simultaneously — watch how each “step” works side by side.

w_new = w_old − η × gradientMoving a parameter downhill

F_new = F_old + η × tree(residuals)Adding a tree that corrects errors

Gradient Descent

7.000

Loss

8.500

Gradient

+4.000

Gradient Boosting

MSE

17.429

Max |residual|

6.00

Step

Learning rate (η)0.30

Side-by-side comparison. Hit “1 step” to advance both simultaneously. GD adjusts a single parameter w to minimize loss. GB adds a tree to correct prediction errors. Same η, same number of steps — watch both converge.