01/
Gradient Descent
4 concepts · Drag sliders and run simulations
Why Gradients Point Downhill
Drag the dot on the curve (or use the slider). The dashed yellow line is the tangent — its slope IS the gradient. Watch how the gradient arrow always points uphill, and the negative gradient points downhill.
7.0
Position (w)
7.0
Loss
8.50
Gradient (slope)
+4.0
−Gradient
-4.0
Gradient = +4.0 — the tangent line tilts uphill to the right (slope = +4.0). Negative gradient points LEFT.
w_new = 7.0 − η×4.0 moves w left. That's downhill.Learning Rate
Watch gradient descent step by step. Adjust η to see overshooting vs slow convergence, or compare all three at once.
0.30
Step
0
w
7.000
Loss
8.500
Gradient
+4.000
Hit “1 step” to start. Try η=0.30 first, then 0.02 (slow) and 1.00 (overshoots). Or hit “Compare All 3” to see them side by side.
Batch vs SGD vs Mini-batch
Select a variant and run it. Each starts from the same point (top-right) heading to the minimum (center). Adjust the noise slider to see the full spectrum from clean to chaotic.
0.70
Select a variant and hit “Run selected.”
GD vs Gradient Boosting
Same algorithm, different spaces. Shared controls advance both simultaneously — watch how each “step” works side by side.
wnew = wold − η × gradientMoving a parameter downhill
Fnew = Fold + η × tree(residuals)Adding a tree that corrects errors
Gradient Descent
w
7.000
Loss
8.500
Gradient
+4.000
Gradient Boosting
MSE
17.429
Max |residual|
6.00
Step
0
0.30
Side-by-side comparison. Hit “1 step” to advance both simultaneously. GD adjusts a single parameter w to minimize loss. GB adds a tree to correct prediction errors. Same η, same number of steps — watch both converge.