I’m an absolute beginner at using Airflow and have recently been working on an ETL project. I’ve encountered the following error:
[2024-05-21, 13:59:50 UTC] {task_context_logger.py:104} ERROR - Detected zombie job: {'full_filepath': '/usr/local/airflow/dags/main.py', 'processor_subdir': '/usr/local/airflow/dags', 'msg': "{'DAG Id': 'transfermarkt', 'Task Id': 'extract_dataset', 'Run Id': 'manual__2024-05-21T03:58:38.921894+00:00', 'Hostname': 'af499e9db556'}", 'simple_task_instance': <airflow.models.taskinstance.SimpleTaskInstance object at 0x741e09342050>, 'is_failure_callback': True} (See https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/tasks.html#zombie-undead-tasks)
I’m managing the Airflow instance using Astro CLI. The task that’s giving the error is making multiple API requests to BigQuery in parallel using Python threads, and I’m executing the task with python operator. The whole task takes approximately 20s with thread and 1 min without. When I run the task without threading, it works just fine.
Thanks in advance for your help!
AIRFLOW__SCHEDULER__LOCAL_TASK_JOB_HEARTBEAT_SEC=600
AIRFLOW__SCHEDULER__SCHEDULER_ZOMBIE_TASK_THRESHOLD=2
AIRFLOW__SCHEDULER__ZOMBIE_DETECTION_INTERVAL=5
I’ve tried to run the task by twicking the scheduler config above using environment variable but it didn’t work.
Sha Kik is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.