Autoscaling — Not Magic

Spike the traffic and watch instances scale out on CPU, then scale back in. Feel the two gotchas: cold-start lag, and 50 app instances still choking on one slow database.

Press play, then hit 🔥 Spike. Watch instances boot to chase the load — and notice the requests dropped during the cold-start gap. Then flip on the slow database.

📈 Traffic150 req/s

Traffic

150/s

Instances

Avg CPU

Dropped

0/s

Instance fleet — green = serving, amber = cold-starting

🖥️ready

Keeping up — capacity matches demand.

What just happened

▹Autoscaling adds instances when a signal (here, CPU) crosses a threshold and removes them when load falls — capacity tracks demand instead of being fixed.
▹It isn't instant: new instances cold-start. During those few seconds the existing instances are over capacity and requests are dropped — a real spike still bites before scaling catches up.
▹Autoscaling on app CPU is blind to downstream limits. Turn on the slow database and watch requests keep dropping while CPU looks calm — 8 app instances can't beat one capped database.