Guide: Troubleshooting¶
Use this page as a fast diagnosis flow.
Step 1: Capture context first¶
Always capture:
- exact startup command
- Palfrey and Python version (
palfrey --version) - platform
- error logs with timestamps
Step 2: Identify failure class¶
Startup/import failures¶
Symptoms:
- process exits immediately
- import/module/factory errors
Actions:
- verify
APPtarget - verify working directory or use
--app-dir - verify virtual environment dependencies
Bind/socket failures¶
Symptoms:
- address in use
- socket path errors
Actions:
- free conflicting process/port
- verify socket permissions/path
Request/runtime failures¶
Symptoms:
- 4xx/5xx responses
- slow responses under load
Actions:
- inspect app exception logs
- check concurrency/timeout settings
- verify dependency health (DB/cache/API)
WebSocket failures¶
Symptoms:
- handshake rejection
- connection closes immediately
Actions:
- verify upgrade headers end-to-end
- verify proxy websocket forwarding
- test direct server connection
Reload/worker behavior surprises¶
Symptoms:
- reload not triggered
- unexpected worker exits
Actions:
- validate include/exclude patterns
- verify process model (
reloadvsworkers) - inspect healthcheck and recycle settings
Reference probe app¶
from __future__ import annotations
import time
STARTED = time.time()
async def app(scope, receive, send):
"""Expose uptime and status for probe endpoints."""
if scope["type"] != "http":
return
path = scope.get("path", "/")
if path == "/healthz":
body = b"ok"
status = 200
elif path == "/readyz":
body = f"uptime={time.time() - STARTED:.2f}".encode()
status = 200
else:
body = b"not found"
status = 404
await send(
{
"type": "http.response.start",
"status": status,
"headers": [
(b"content-type", b"text/plain"),
(b"content-length", str(len(body)).encode("ascii")),
],
}
)
await send({"type": "http.response.body", "body": body})
Incident handoff template¶
- what happened
- when it started
- impact scope
- current mitigation
- next investigation step
Plain-language summary¶
Troubleshooting speed improves when teams classify the problem first, then debug inside the correct category.