Why FastAPI Background Tasks Break for Real AI Jobs
FastAPI's built-in BackgroundTasks run in-process — if the server restarts mid-job, the task is silently lost with no retry and no queue.
See also: async AI pipelines for multi-hour jobs and deploying FastAPI + Celery on Render.
BackgroundTasks are not a task queue. They are a convenience wrapper that schedules a function to run after the response returns, still inside the same uvicorn worker process. Deploy a new version, hit an OOM, or restart the container — the in-flight AI job vanishes. No message in Redis. No retry. No audit trail. For a twenty-second OpenAI call on a demo, that is acceptable. For 161 sequential GPT-4o completions over two to four hours after a Stripe payment, it is a refund waiting to happen.
Three failure modes pushed me off BackgroundTasks entirely:
- Server restart loses the task. Rolling deploys on Render kill the process. The customer paid; the report never generates.
- No retry on transient API errors. OpenAI returns 429 or 503. BackgroundTasks do not retry unless you write retry logic inside every function — duplicated across every pipeline.
- No concurrency control across products. A four-hour PDF assembly job and a twenty-second fast report share the same process thread pool unless you architect around it manually.
Celery with Redis as broker and result backend fixes all three. Messages persist in Redis until a worker acknowledges success. Workers restart independently from the API. Queues route workloads by product. When a customer pays via Stripe, the webhook handler returns in milliseconds; the Celery message sits in Redis until a dedicated worker picks it up, runs the full pipeline, and acknowledges completion. If that worker dies at minute 90 of a four-hour job, the message re-queues — it does not disappear with the process.
I hit this wall building an AI report generation SaaS where the shortest product finishes in fifteen seconds and the longest runs 161 GPT-4o calls plus 173 WeasyPrint PDF chunks. FastAPI BackgroundTasks cannot survive that range. Hassan Raza documents related async patterns for Next.js on hassanr.com; this post is the Python side of production AI pipelines where losing a paid order is not an option.
| FastAPI BackgroundTasks | Celery + Redis | |
|---|---|---|
| Persistence | ❌ In-process, lost on restart | ✅ Messages persisted in Redis |
| Retry on failure | ❌ Manual or none | ✅ Built-in with configurable backoff |
| Worker crash recovery | ❌ Task silently dropped | ✅ task_acks_late + reject_on_worker_lost |
| Queue routing | ❌ Not supported | ✅ Route by product, priority, or team |
| Concurrency control | ❌ Shares server thread pool | ✅ Per-worker, per-queue tuning |
| Time limits | ❌ None | ✅ Soft + hard per task |
| Monitoring | ❌ No built-in | ✅ Flower, structlog, Celery events |
| Deployment | Simple (same process) | 1 API + N dedicated worker services |
| Best for | Fast tasks (<5s), non-critical | AI jobs, long tasks, critical pipelines |
One Queue Per Product, One Worker Per Queue
Routing different AI workloads to dedicated queues prevents a four-hour job from blocking a twenty-second job on the same worker.
Why queue isolation matters
The platform I built has three report products with wildly different AI workloads:
- Fast report: four parallel OpenAI Agents SDK calls via
asyncio.gather()— ~15–20 seconds —queue.life_clarity - Medium report: fifty-two sequential calls, batched five-wide with 1.5s inter-batch sleep — ~110–136 seconds —
queue.personal_blueprint - Long report: 161 strictly sequential calls, 2.0s sleep between batches — 141–190 minutes —
queue.personal_horoscope
Without queue routing, a single long-running horoscope task blocks every other task on that worker. A customer who just paid for a twenty-second fast report waits hours because a 161-call job occupies the only worker thread. Product-specific queues plus dedicated workers guarantee fast jobs stay fast regardless of how many long jobs are in flight.
The 5 Render services (1 API + 4 dedicated workers)
Five services on Render: one FastAPI API (uvicorn, two workers, starter plan) plus four Celery workers — clarity (concurrency 2), blueprint (concurrency 1), horoscope (concurrency 1, standard plan for RAM), bundle (concurrency 2). The horoscope worker runs on a standard plan because WeasyPrint PDF assembly peaks at 1.5–2 GB RAM assembling 173 chunks. Starter-plan RAM cannot survive that workload reliably.
Bundle products dispatch all three sub-tasks to their respective queues independently, then send one combined delivery email when all three finish. The bundle coordinator itself runs on queue.bundle with soft limit 120s and hard limit 180s — it orchestrates, it does not generate. A customer buying the bundle gets three parallel pipelines: the fast report on the clarity worker, the medium report on the blueprint worker, and the long report on the horoscope worker. None blocks the others because each queue has its own consumer process.
This was the single most important architecture decision I made after watching a twenty-second job queue behind a four-hour job on a shared worker. Dedicated queues cost more in Render services — five instead of two — but the alternative is angry customers who paid and waited hours for a product that should have delivered in seconds. For B2C AI SaaS where Stripe triggers generation immediately after payment, queue isolation is not premature optimization; it is customer retention.
Concurrency on the long-running horoscope worker must stay at 1. With 1.5–2 GB peak RAM per task, setting concurrency=2 runs two PDF assemblies simultaneously — OOM crash on the worker, task re-queues, customer waits another four hours. Scale horizontally by adding more horoscope worker services on Render, never vertically by raising concurrency above 1.
The Celery Configuration That Makes Tasks Reliable
Three settings in a FastAPI Celery Redis AI jobs production setup — task_acks_late, task_reject_on_worker_lost, and worker_prefetch_multiplier=1 — are the difference between a Celery deployment that loses work and one that does not.
What each setting does (and why the default is wrong)
# app/workers/celery_app.py
import structlog
from celery import Celery
from app.core.config import settings
from app.workers.queues import CeleryQueue
logger = structlog.get_logger()
celery_app = Celery(
"pulseclarity",
broker=settings.REDIS_URL,
backend=settings.REDIS_URL,
)
celery_app.conf.update(
# JSON only — no pickle deserialization attacks
task_serializer="json",
result_serializer="json",
accept_content=["json"],
# Message stays in Redis until task returns success — not when worker picks it up
task_acks_late=True,
# If worker process dies mid-task, message re-queues instead of vanishing
task_reject_on_worker_lost=True,
# Worker takes exactly 1 task — prevents hoarding long jobs from other queues
worker_prefetch_multiplier=1,
result_expires=3600, # clear stale results after 1 hour
# Global defaults — overridden per task for long reports
task_soft_time_limit=600, # 10 min — raises SoftTimeLimitExceeded
task_time_limit=660, # 11 min — hard kill
task_routes={
"app.workers.tasks.generate_life_clarity": {"queue": CeleryQueue.LIFE_CLARITY},
"app.workers.tasks.generate_blueprint": {"queue": CeleryQueue.PERSONAL_BLUEPRINT},
"app.workers.tasks.generate_horoscope": {"queue": CeleryQueue.PERSONAL_HOROSCOPE},
"app.workers.tasks.coordinate_bundle": {"queue": CeleryQueue.BUNDLE},
},
)
logger.info("celery_app_configured", broker=settings.REDIS_URL.split("@")[-1])
task_acks_late=True: the broker message is not removed until the task function returns successfully. Default early ack removes the message when the worker starts — crash mid-run and the job is gone forever.
task_reject_on_worker_lost=True: if the worker process dies (OOM, SIGKILL, deploy), the message returns to the queue for another worker.
worker_prefetch_multiplier=1: each worker child process prefetches exactly one message. Without this, a worker with concurrency 4 might grab four long horoscope jobs and block its queue while fast jobs starve elsewhere.
Per-product time limits
Global soft limit: 600s. Global hard limit: 660s. Blueprint overrides: soft 480s, hard 540s. Horoscope overrides: soft 21,600s (6 hours), hard 22,500s (6.25 hours). Override per task via the decorator — pass soft_time_limit=21_600 and time_limit=22_500 on the @celery_app.task decorator for long reports; do not rely on global defaults for jobs that legitimately run longer than ten minutes.
Soft limit raises SoftTimeLimitExceeded inside your task so you can checkpoint partial state and call self.retry() cleanly. Hard limit sends SIGKILL — no cleanup hook runs, no finally block executes. Set soft limits with enough headroom to persist the last completed section to MongoDB before the hard kill arrives. I learned this after a horoscope task hit the global 660s hard limit during testing and lost forty minutes of completed API calls because I had not yet implemented section checkpointing.
Redis serves double duty here: broker for task messages and backend for result storage. Same REDIS_URL instance keeps infrastructure simple on Render. Results expire after 3,600 seconds — you do not want stale task results accumulating in Redis forever. Structured logging via structlog on worker startup confirms which broker endpoint each worker connected to without logging credentials.
How to Define a Reliable Task: Retries, Queues, and Startup
A production Celery task needs max_retries, acks_late, reject_on_worker_lost, and a startup hook that reconnects MongoDB before the first task runs.
# app/workers/tasks.py
from celery import Task
from celery.exceptions import SoftTimeLimitExceeded
from openai import OpenAIError
from app.workers.celery_app import celery_app
from app.pipelines.horoscope_runner import run_horoscope_pipeline
@celery_app.task(
name="app.workers.tasks.generate_horoscope",
bind=True,
max_retries=2,
default_retry_delay=60,
acks_late=True,
reject_on_worker_lost=True,
soft_time_limit=21_600, # 6 hours
time_limit=22_500, # 6.25 hours
queue="queue.personal_horoscope",
)
def generate_horoscope(self, stripe_session_id: str, metadata: dict) -> dict:
try:
result = run_horoscope_pipeline(stripe_session_id, metadata)
return {"status": "complete", "session_id": stripe_session_id}
except SoftTimeLimitExceeded as exc:
raise self.retry(exc=exc, countdown=60)
except (OpenAIError, ConnectionError) as exc:
countdown = 30 * (2 ** self.request.retries)
raise self.retry(exc=exc, countdown=countdown)
# app/workers/worker_init.py
from celery.signals import worker_init
from app.db.mongo import connect_mongo, ensure_indexes
@worker_init.connect
def init_worker(**kwargs):
"""Connect DB after fork — never at module import time."""
connect_mongo()
ensure_indexes()
Worker startup — connecting dependencies before accepting tasks
Every worker connects MongoDB and runs ensure_indexes() inside the worker_init signal before accepting tasks. Module-level MongoClient connections do not survive fork() on Linux — you get "MongoClient opened before fork" warnings and intermittent connection failures under load.
All tasks share the same reliability envelope: max_retries=2, acks_late=True, reject_on_worker_lost=True. Fast and medium reports use default_retry_delay=30; the long horoscope report uses 60 seconds because each retry may resume from a checkpoint rather than starting cold. Inside the service layer, tenacity handles exponential backoff on individual OpenAI calls. Celery retry is the outer safety net — if the entire pipeline throws after all inner retries exhaust, Celery re-queues the whole task with countdown=30 * (2 ** self.request.retries).
The FastAPI server runs uvicorn with two workers on a starter plan — it handles HTTP, Stripe webhooks, and customer polling endpoints. It never runs AI generation. That separation keeps API latency stable even when four horoscope jobs are running simultaneously across horizontally scaled worker services.
Always connect your database inside the worker startup signal, not at module level. Module-level connections break after fork() and produce silent failures in production. The same rule applies to Redis clients, HTTP session pools, and any long-lived connection — initialize per worker child, not per import.
Idempotent Tasks and the Stripe → Celery Dispatch Pattern
A Stripe webhook can fire multiple times for the same payment — your dispatch logic must check database status before dispatching to prevent double-processing the same order.
The idempotency check (status guard before dispatch)
# app/api/routes/stripe_webhook.py
from fastapi import APIRouter, HTTPException, Request
import stripe
from app.core.config import settings
from app.models.report_status import ReportStatus
from app.repositories.payment_repo import payment_repo
from app.workers.tasks import generate_life_clarity, generate_horoscope
router = APIRouter()
@router.post("/webhook")
async def stripe_webhook(request: Request):
payload = await request.body()
sig = request.headers.get("stripe-signature")
try:
event = stripe.Webhook.construct_event(
payload, sig, settings.STRIPE_WEBHOOK_SECRET
)
except stripe.error.SignatureVerificationError:
raise HTTPException(status_code=400)
if event["type"] == "checkout.session.completed":
await _handle_checkout_completed(event["data"]["object"])
# Always 200 — Stripe won't retry based on our response; Celery handles failures
return {"received": True}
async def _handle_checkout_completed(session: dict):
session_id = session["id"]
payment = await payment_repo.find_by_session_id(session_id)
# IDEMPOTENCY GUARD: skip if already processing or done
if payment.report_status != ReportStatus.PENDING:
return
# CORRECT ORDER: update status FIRST, then dispatch
await payment_repo.update_status(session_id, ReportStatus.GENERATING)
if payment.report_type == "fast_report":
generate_life_clarity.apply_async(
args=[session_id, payment.metadata],
queue="queue.life_clarity",
)
elif payment.report_type == "long_report":
await send_order_confirmation_email(payment.customer_email)
generate_horoscope.apply_async(
args=[session_id, payment.metadata],
queue="queue.personal_horoscope",
)
Why the webhook always returns 200
Stripe retries any non-200 response. If your database is down or Redis is full and you return 500, Stripe fires the webhook again — potentially dispatching twice if your idempotency check races. Return 200 always; log errors internally; let Celery's retry mechanism handle AI failures. The webhook's job is to acknowledge receipt, not to guarantee downstream success.
The dispatch pattern I use: load payment by stripe_session_id, check report_status, update to generating, route to the product-specific queue via apply_async(args=[...], queue=...). Long reports send a confirmation email first — telling the customer to expect two to four hours — before the horoscope task enters the queue. Admin re-run tools use the same idempotency guard: if status is not pending, the dispatch is skipped. Combined with task_acks_late, a task cannot be double-processed even if a worker crash triggers a re-queue mid-run — the database status is already generating, so a duplicate webhook dispatch never fires a second task.
Never dispatch a Celery task before updating the database status. If apply_async succeeds but the status update fails, and Stripe retries the webhook, your idempotency check still sees status=pending and dispatches a second time. The correct order is always: update status to generating → then dispatch task.
Crash Recovery: Checkpoint Progress at the Section Level, Not the Task Level
For AI jobs with fifty-plus API calls, crash recovery must happen at the individual call level — not the task level — or a crash at call 140 of 161 means restarting from zero and paying for 140 API calls again.
Section-level upsert pattern
The naive approach: if a worker crashes at task 140 of 161, Celery retries the entire task from call zero. The correct approach: after each AI call completes, upsert the result to MongoDB immediately. On retry or restart, the runner loads existing sections and skips completed ones. The loop pattern looks like this: for each section, check whether it already exists in the database with status complete — if yes, skip; if no, call the model, then upsert the result as a checkpoint before moving to the next section.
Blueprint runners use PersonalBlueprintRepository.upsert_section(stripe_session_id, section_id, data) after each of the fifty-two calls. Horoscope runners use PersonalHoroscopeRepository.upsert_batch(stripe_session_id, batch_id, batch_data) per batch across 161 calls. Medium reports batch five calls with asyncio.gather() and sleep 1.5 seconds between batches — eleven batch rounds total. Long reports run strictly sequential with 2.0 seconds of inter-batch sleep alone accounting for 320 seconds of deliberate pacing before model latency.
Tenacity handles exponential backoff inside the service layer on transient OpenAI errors. Celery retry is the outer safety net with max_retries=2. The two layers solve different problems: tenacity retries a single failed API call within a running task; Celery retries the entire task if the worker process dies or the pipeline throws after inner retries exhaust.
The real-world impact
For a 161-call job: crash at call 140 costs 21 more API calls (~$3–5) on retry instead of 161 (~$15–25). Without checkpointing, every worker OOM during PDF assembly — peak RAM hits 1.5–2 GB rendering 173 WeasyPrint chunks — would restart the entire pipeline from call zero. With checkpointing, the retry resumes at call 141, renders only the remaining PDF sections, and delivers the report the customer already paid for.
Section checkpointing is not optional for any pipeline with fifty-plus calls. I consider it mandatory the moment a single order exceeds ten API calls. One open TODO remains in the bundle coordinator: failure notifications to customers are not implemented — bundle sub-report failures log but do not email the customer yet. The webhook always returns 200 regardless; Stripe will not retry downstream errors, and Celery handles AI failures internally.
Celery retry saves your task. Section-level checkpointing saves your API budget. For AI jobs with 50+ calls, you need both — the task will retry, but without checkpointing it pays for every call it already completed.
Frequently Asked Questions
Install celery[redis], point broker and backend at Redis, define tasks, and dispatch with apply_async(). Hassan Raza runs production AI pipelines this way—a task persists in Redis, not in the FastAPI process. Set task_acks_late=True so messages stay queued until success, task_reject_on_worker_lost=True to re-queue on worker death, and worker_prefetch_multiplier=1 so workers don't hoard long jobs. Connect MongoDB inside a worker_init signal, not at module import—module-level clients break after fork(). The webhook returns immediately; Celery workers process 161 GPT-4o calls while uvicorn stays responsive.
Celery with Redis is the default for production Python AI workloads with mixed runtimes. Hassan Raza routes twenty-second reports and four-hour PDF jobs to separate queues—only Celery makes that routing trivial at scale. Built-in retry with backoff, soft and hard time limits, and Flower monitoring cover production needs ARQ and RQ lack. ARQ suits pure async shops wanting lighter setup; Dramatiq is simpler but less mature. For diverse AI task durations on the same platform, Celery wins.
Use a database status guard before dispatch and task_acks_late for broker reliability. Hassan Raza checks report_status is pending before Stripe checkout.session.completed dispatches—already generating or complete orders skip entirely. Always return HTTP 200 from Stripe webhooks; Celery retries handle downstream failures, not Stripe's retry storm. Update database status to generating before apply_async—never dispatch first. task_acks_late prevents message loss on crash but does not stop duplicates; the status guard does.