I am using the managed version of Apache Airflow with Cloud Composer in version composer-2.6.1-airflow-2.6.3
.
I have got some of my tasks that are both in a success state (I am able to find them and their logs using Browse > Task Instances) and in no state at all (when going in the grid view of the DAG, there is simply no square representing the task, see screenshot). Tasks are not in the removed state, there is really no square hence no state.
When I am trying to rerun the dagrun that contains the suspicious tasks using the menu Clear > Queue up new tasks, nothing happens, which seems pretty normal because the tasks actually ran and actually are in a success state.
Well, one could think it’s only a visual issue that do not matter too much as I’m still able to access the logs of the tasks using the Browse > Task Instances menu.
However, I’ve got some external sensors that do not meet their requirement as they were waiting for the faulty tasks to be successful.
I believe this is all due to an upgrade of the composer environment. I noticed I had a worker pod that was trying to connect to an old version of the airflow database (composer-2-3-5-airflow-2-5-3
) even months after the update.
the error message from the worker pod looked like this
OperationalError: FATAL: database "composer-2-3-5-airflow-2-5-3-550b9f5a" does not exist
I managed somehow to kill this faulty worker pod and now, all the worker pods are “talking” to the correct airflow database, but the tasks are still both in a successful state and in no state at all.
I could set all the ExternalTaskSensor
s to a successful state by hand but I feel like this will happen again.
According to the documentation, I should be able to connect to the airflow database however, the sql_alchemy_conn
parameter is < hidden >
in my Configuration UI.
Does someone have any clue? Is there an official way to make an airflow task fully recognized as successful?
How can I prevent this from happening again?