Skip to content

Retries and Recovery

Retry Model

Retries are controlled by RestartPolicy.

Core behavior:

  • retries stop at max_retries
  • countdown is linear or exponential based on exponential_backoff
  • optional jitter randomizes retry timing
  • optional exception allowlist narrows retry scope

Profiles

CeleryLanguage defines profile presets:

  • none
  • standard
  • aggressive

Profiles are merged first, explicit constructor fields override profile defaults.

Policy Sources

Policy can come from:

  1. language defaults
  2. profile defaults
  3. explicit constructor fields
  4. per-dispatch policy passed to say(...)

Recovery Capabilities

  • Journal.restart_recent(...) for restarting recent failed runs.
  • Language.resubmit(...) for replaying a recorded run from stored payload.
  • FastAPI example includes admin endpoints for full-job requeue and compiled-node requeue.

Event Signals to Watch

Useful events for operations and diagnostics:

  • job.retry_scheduled
  • subjob.retry
  • job.failed
  • workflow.lowering_failed