We are running an Apache Airflow server with several workers and quite a few and complex DAGs.
Recently we have experienced that some DAGs have missing runs. There are no errors, and the actual run just doesn’t exist and can’t be rerun as it is just missing. While watching the the UI we saw that at the time the run was supposed to start (it was scheduled and shown as ‘next run’) it just changed the next run to the one 3 hours later without starting the correct run.
The only way we can run the actual run is to create a new DAG where we start it before the missing run time. and then remove it again when done.
We have searched the web and tried to find something in the logs, but so far without success. We have also monitored the servers involved and the network, but everything seems fine and all the other DAGs are running as intended.