Performance Playbook
- Edge-first rendering via Cloudflare Pages
- SSE and WebSocket pipelines built for sub-150ms p99
- Durable Object partitioning per session/tenant
Latency Budgets
Component | Avg | p99 | Optimization |
---|---|---|---|
Edge Routing | 5ms | 12m | Anycast |
Auth Validation | 20m | 50m | Token cache |
LLM Streaming | 80m | 150 | Shard models, reduce hops |
WS Broadcast | 15m | 30m | Affinity, minimal payloads |
DB Persistence | 10m | 25m | Edge DB and batching |
Caching and Batching
- Edge cache for static assets
- Prompt/result caching where safe
- Batch DB writes and debounce broadcasts