💡 What’s happening (AI/ML mapping)
Customers = requests (RPS / concurrency)
Line length = queueing delay → latency
Baristas = compute (GPUs/replicas/workers)
Pantry speed = bandwidth / data movement (KV cache/memory/network)
Batch size = microbatching (throughput↑, latency↑)
P95 wait = “95% of customers waited ≤ this time” (tail latency)
Goodput = drinks that actually succeed (no remakes).