I have a glue catalog table which is updated by an upstream process multiple times in a day. Table is partitioned on two columns business_date and source. Upstream will never append data in a partition. If partition already exists, it will be overwritten. Also upstream can modify the table for only older business_date as well (e.g. one month old business date).
I need to trigger an Airflow DAG whenever a partition is updated. Because of security concerns, I can not trigger the DAG using AWS lambda. Only option I found is to schedule the DAG and use airflow sensors.
I have explored the AwsGlueCatalogPartitionSensor and also gone through the link How to trigger a Airflow task only when new partition/data in avialable in the AWS athena table using DAG in python?
AwsGlueCatalogPartitionSensor expects an expression parameter having partition detail. It will check that partition in Glue Catalog table, which is a bottleneck for me as I do not know upfront for which business_date and source combination, upstream has updated the table.