Skip to main content

Observability

  • Metrics: latency, token usage, error rates, WS connections
  • Logs: structured JSON with correlation IDs
  • Tracing: edge traces and spans across chat → tools → storage
  • Alerts: error budgets, quota breaches, degraded regions

SLOs and SLAs

ServiceSLONotes
Chat Streamingp99 < 150ms stepMeasured at edge SSE
WebSocket Broadcastp99 < 30msDurable Object affinity
Auth Validationp99 < 50msToken cache enabled

Health Checks

  • /health verifies DB, integrations, and secrets
  • Synthetic probes per region
  • Canary deploy with automatic rollback

Runbooks

  • Stripe webhook retries
  • Supabase outage fallback (read-only)
  • Edge region failover
See also: Architecture, Performance.