Retries and Recovery
Retry Model
Retries are controlled by RestartPolicy.
Core behavior:
- retries stop at
max_retries - countdown is linear or exponential based on
exponential_backoff - optional jitter randomizes retry timing
- optional exception allowlist narrows retry scope
Profiles
CeleryLanguage defines profile presets:
nonestandardaggressive
Profiles are merged first, explicit constructor fields override profile defaults.
Policy Sources
Policy can come from:
- language defaults
- profile defaults
- explicit constructor fields
- per-dispatch policy passed to
say(...)
Recovery Capabilities
Journal.restart_recent(...)for restarting recent failed runs.Language.resubmit(...)for replaying a recorded run from stored payload.- FastAPI example includes admin endpoints for full-job requeue and compiled-node requeue.
Event Signals to Watch
Useful events for operations and diagnostics:
job.retry_scheduledsubjob.retryjob.failedworkflow.lowering_failed