Is it possible to assign each step of a PySpark ETL pipeline to separate tasks of an Airflow DAG?
From what I’ve found I need to use a SparkSubmitOperator to submit my PySpark script. But what if I want to assign the extract, transform, and load parts of my Spark job to different tasks in my Airflow DAG such that in my DAG I can see: start_etl >> createsession >> extract >> transform >> load >> end_etl
instead start_etl >> etl >> end_etl
.