FastAPI Graceful Shutdown: Handling SIGTERM in Kubernetes
BACKEND SERIES
Day 25: The Silent Goodbye — Masterful Graceful Shutdowns
⏳ Context: It’s 3 PM on a Friday. A P1 vulnerability just hit your desk. You hit 'Deploy' on the fix, feeling like a hero. Thirty seconds later, your PagerDuty explodes. Not because of the vulnerability, but because your deployment just killed 5,000 active AI processing tasks and dropped 10,000 live WebSocket connections. Your server didn't say goodbye; it just died. We’ve all been there—thinking Kubernetes handles the "grace" for us. It doesn't. You have to build the exit door yourself.
| How to Implement Graceful Shutdowns in Kubernetes using the ReadFlag Strategy |
1. The SIGTERM Betrayal
When Kubernetes decides to kill your Pod (during a rolling update or scale-down), it sends a SIGTERM signal. Most Python devs rely on FastAPI's on_event("shutdown") or the newer lifespan utility. Here is the problem: Lifespan events are often too late.
By the time the framework-level shutdown event triggers, the server has usually stopped accepting any logic, and in many configurations, it immediately severs WebSocket transport layers. If you have tasks in a background queue or users waiting for a final socket message, they are already gone. To fix this, we need to intercept the signal at the OS level before the framework starts panicking.
Raw Signal Interception Logic (GitHub):
github.com/.../graceful_shutdown.py2. The "ReadFlag" Strategy
The core of a production-grade shutdown is a global boolean: readFlag. By default, it's True. Your /healthz endpoint returns 200 OK. The moment SIGTERM hits, you flip that flag to False. Suddenly, your health check returns 503 Service Unavailable.
Kubernetes' readinessProbe will see the 503, wait for the failureThreshold, and then stop sending new traffic to that Pod. This gives your existing connections a "quiet period" to finish their work without being bombarded by new users.
import signal from fastapi import FastAPI, Response app = FastAPI() # The "Lifeboat" flag SHOULD_ACCEPT_TRAFFIC = True def handle_sigterm(*_): global SHOULD_ACCEPT_TRAFFIC SHOULD_ACCEPT_TRAFFIC = False # Log for K8s logs to show we intercepted correctly print("SIGTERM received. Draining traffic...") # Register the OS signal immediately signal.signal(signal.SIGTERM, handle_sigterm) @app.get("/healthz") async def readiness_probe(): if not SHOULD_ACCEPT_TRAFFIC: return Response(status_code=503) return {"status": "ok"} @app.post("/submit-task") async def create_order(): # Race condition protection: K8s might send one last request if not SHOULD_ACCEPT_TRAFFIC: return Response(status_code=503, content="Server Terminating") # Process logic here... return {"status": "processing"}
The Shareable Quote: "A server that can't say goodbye properly is a server that shouldn't be trusted with state. Graceful shutdowns are the difference between a professional architecture and a lucky one."
🛠️ Day 25 Project: The Kubernetes Survivalist
Configuring the code is only half the battle. You need to configure the orchestrator.
- Write a
deployment.yamlthat setsterminationGracePeriodSeconds: 60. - Configure a
readinessProbepointing to/healthzwith aperiodSeconds: 5. - In your Python code, implement a task counter. If
SIGTERMis received, wait in awhile active_tasks > 0: await asyncio.sleep(1)loop before finally exiting. - Bonus: Send a "Server maintenance starting" message to all WebSocket clients before the flag flips.
Draining traffic is great, but how do you know which traffic to drain? Tomorrow, we look at the eyes of the system: Day 26: Distributed Tracing & Observability.
Comments
Post a Comment
?: "90px"' frameborder='0' id='comment-editor' name='comment-editor' src='' width='100%'/>