In the past I was aware of the fact that most people felt that Actions in a Notebook or Spark App would run sequentially. That is what I thought until someone stated on SO – cannot find that anymore – that independent Actions can run in parallel in the same Spark App. Can.
I just re-tested this on a same Databricks Cluster Notebook
as follows.
-
2
count
s from 2 independently created RDD’s. From The Stages tab I can see the submits occur in short succession and there is thus no parallel processing; it runs sequentially. -
2 delta
saveAsTable
– they run sequentially. All logic is independent again.
So, why does Spark DAG processing not see – lazy – that there are 2 independent Actions to run? Is there a parameter for this? chatGPT – dare I mention this – after much pushing, states that if there are enough resources it can make the decision to run parallel. May be a Notebook does not allow this.