Skip to content

Observability

Observability makes runtime behavior visible enough to debug quickly.

Three pillars to implement

  • logs
  • metrics
  • traces (or at minimum request IDs)

Logging baseline

  • structured logs in production
  • request IDs in every app log line
  • access logs enabled where auditability is required

Metrics baseline

Track at least:

  • request rate
  • error rate
  • p50/p95/p99 latency
  • active connections
  • worker restarts

Tracing/request correlation

If full tracing is unavailable, add request ID middleware and propagate ID in headers/logs.

Alerting priorities

  • sustained error-rate increase
  • latency SLO violation
  • repeated worker crash loops
  • health endpoint failures

Non-technical summary

Observability is how teams answer three incident questions quickly:

  • what is broken
  • how bad it is
  • where to act first