Featured

FastAPI Graceful Shutdown: Handling SIGTERM in Kubernetes

BACKEND SERIES

Day 25: The Silent Goodbye — Masterful Graceful Shutdowns

Series: Logic & Legacy
Day 25 / 50
Level: Senior / SRE

Context: It’s 3 PM on a Friday. A P1 vulnerability just hit your desk. You hit 'Deploy' on the fix, feeling like a hero. Thirty seconds later, your PagerDuty explodes. Not because of the vulnerability, but because your deployment just killed 5,000 active AI processing tasks and dropped 10,000 live WebSocket connections. Your server didn't say goodbye; it just died. We’ve all been there—thinking Kubernetes handles the "grace" for us. It doesn't. You have to build the exit door yourself.

An infographic comparing the chaotic default "SIGTERM Betrayal" shutdown with the organized, five-step "ReadFlag Strategy" for masterful graceful Kubernetes shutdowns.
How to Implement Graceful Shutdowns in Kubernetes using the ReadFlag Strategy


1. The SIGTERM Betrayal

When Kubernetes decides to kill your Pod (during a rolling update or scale-down), it sends a SIGTERM signal. Most Python devs rely on FastAPI's on_event("shutdown") or the newer lifespan utility. Here is the problem: Lifespan events are often too late.

By the time the framework-level shutdown event triggers, the server has usually stopped accepting any logic, and in many configurations, it immediately severs WebSocket transport layers. If you have tasks in a background queue or users waiting for a final socket message, they are already gone. To fix this, we need to intercept the signal at the OS level before the framework starts panicking.

Raw Signal Interception Logic (GitHub):

github.com/.../graceful_shutdown.py

2. The "ReadFlag" Strategy

The core of a production-grade shutdown is a global boolean: readFlag. By default, it's True. Your /healthz endpoint returns 200 OK. The moment SIGTERM hits, you flip that flag to False. Suddenly, your health check returns 503 Service Unavailable.

Kubernetes' readinessProbe will see the 503, wait for the failureThreshold, and then stop sending new traffic to that Pod. This gives your existing connections a "quiet period" to finish their work without being bombarded by new users.

The Production Shutdown Guard
import signal
from fastapi import FastAPI, Response

app = FastAPI()
# The "Lifeboat" flag
SHOULD_ACCEPT_TRAFFIC = True

def handle_sigterm(*_):
    global SHOULD_ACCEPT_TRAFFIC
    SHOULD_ACCEPT_TRAFFIC = False
    # Log for K8s logs to show we intercepted correctly
    print("SIGTERM received. Draining traffic...")

# Register the OS signal immediately
signal.signal(signal.SIGTERM, handle_sigterm)

@app.get("/healthz")
async def readiness_probe():
    if not SHOULD_ACCEPT_TRAFFIC:
        return Response(status_code=503)
    return {"status": "ok"}

@app.post("/submit-task")
async def create_order():
    # Race condition protection: K8s might send one last request
    if not SHOULD_ACCEPT_TRAFFIC:
        return Response(status_code=503, content="Server Terminating")
    
    # Process logic here...
    return {"status": "processing"}

The Shareable Quote: "A server that can't say goodbye properly is a server that shouldn't be trusted with state. Graceful shutdowns are the difference between a professional architecture and a lucky one."

🛠️ Day 25 Project: The Kubernetes Survivalist

Configuring the code is only half the battle. You need to configure the orchestrator.

  • Write a deployment.yaml that sets terminationGracePeriodSeconds: 60.
  • Configure a readinessProbe pointing to /healthz with a periodSeconds: 5.
  • In your Python code, implement a task counter. If SIGTERM is received, wait in a while active_tasks > 0: await asyncio.sleep(1) loop before finally exiting.
  • Bonus: Send a "Server maintenance starting" message to all WebSocket clients before the flag flips.
🔥 PRO UPGRADE / TEASER

Draining traffic is great, but how do you know which traffic to drain? Tomorrow, we look at the eyes of the system: Day 26: Distributed Tracing & Observability.

Architectural Consulting

If you are building a data-intensive AI application and require a Senior Engineer to architect your secure, high-concurrency backend, I am available for direct contracting.

Explore Enterprise Engagements →

Comments