🍂☕

Roofline Barista

💡 What’s happening (AI/ML mapping)

Customers = requests (RPS / concurrency)
Line length = queueing delay → latency
Baristas = compute (GPUs/replicas/workers)
Pantry speed = bandwidth / data movement (KV cache/memory/network)
Batch size = microbatching (throughput↑, latency↑)
P95 wait = “95% of customers waited ≤ this time” (tail latency)

Goodput = drinks that actually succeed (no remakes).

🧑‍🍳 Baristas

1

More baristas = more capacity (compute).

🧺 Pantry speed

0.60×

If baristas “wait”… you're bandwidth-limited.

📦 Batch size

Off

Unlocked later. More throughput, but adds waiting.

✅ Double-checks

Medium

Unlocked later. Fewer remakes, but slower service.

Tutorial

Welcome

—

—

—