I’ve been having difficulty configuring SuperSet’s Celery functionality, namely that reports are never executed. We have a SuperSet installation installed using the PyPI method, with a Celery installation that uses RabbitMQ as a broker and Redis as a backend storage.
Celery Beat appears to be working fine. When I run a celery worker command, every minute the reports.scheduler task runs and succeeds, enqueuing the scheduled reports. However, reports.execute never seems to run properly: the logs show the consumer receiving the messages, and RabbitMQ reports they are now in the “unacked” state, but they never go past that state and emails are generally not sent (except for a few times where I did receive them with errors). There seems to be no bounds to how many can be unacked at a time, and quitting the worker causes them to revert to “ready” state.
The worker command:
celery --app=superset.tasks.celery_app:app worker --pool=prefork -O fair -c 4 --loglevel=INFO --logfile=/tmp/celery.log
The log output:
[2024-09-06 16:53:00,003: INFO/MainProcess] Task reports.scheduler[8acb0e53-920d-4f93-9840-80a82bf02971] received
[2024-09-06 16:53:00,010: INFO/ForkPoolWorker-1] Scheduling alert The first report eta: 2024-09-07 06:53:00
[2024-09-06 16:53:00,012: INFO/ForkPoolWorker-1] Scheduling alert A second report eta: 2024-09-07 06:53:00
[2024-09-06 16:53:00,013: INFO/MainProcess] Task reports.execute[ab9d6839-2e98-4325-9f94-842a602bb3be] received
[2024-09-06 16:53:00,014: INFO/ForkPoolWorker-1] Scheduling alert My local legacy report eta: 2024-09-07 06:53:00
[2024-09-06 16:53:00,014: INFO/MainProcess] Task reports.execute[010bbdf5-cf20-48c3-b3ce-331e929845c7] received
[2024-09-06 16:53:00,016: INFO/MainProcess] Task reports.execute[89184b21-e0e4-455c-a226-a197f9077262] received
[2024-09-06 16:53:00,018: INFO/ForkPoolWorker-1] Task reports.scheduler[8acb0e53-920d-4f93-9840-80a82bf02971] succeeded in 0.013280048966407776s: None
There are some lines in the output reporting that the execute task succeeded, and a few times I have actually received the email (staging that it couldn’t generate the report due to other errors like “failed to generate csv”), but I can’t see any pattern to this and it mostly doesn’t work.
Here’s the Celery configuration portion of the config file (the config
variable just stores credentials; I have them stored in a separate configuration file).
RESULTS_BACKEND = RedisCache(
host=REDIS['host'],
port=REDIS['port'],
key_prefix='superset_results',
)
if 'broker_url' in config:
class CeleryConfig(object):
broker_url = config['broker_url']
imports = (
"superset.sql_lab",
"superset.tasks.scheduler",
)
result_backend = RATELIMIT_STORAGE_URI
worker_prefetch_multiplier = 10
task_acks_late = True
task_annotations = {
"sql_lab.get_sql_results": {
"rate_limit": "100/s",
},
}
beat_schedule = {
"reports.scheduler": {
"task": "reports.scheduler",
"schedule": crontab(minute="*", hour="*"),
},
"reports.prune_log": {
"task": "reports.prune_log",
"schedule": crontab(minute=0, hour=0),
},
}
CELERY_CONFIG = CeleryConfig
Other things to consider: I confirmed that emails work using the test script in Superset’s documentation. I also briefly tried using the Redis backend as a broker as well; no luck.