The problem
I have recently upgraded my apps to run on Spark3.5.1+YARN3.3.6, and observing frequent failures saying “Authorized committer”. The apps run PySpark and I observe the error always happens in the output stage (while writing to S3).
Has anyone had a similar experience after upgrading to the recent version of Spark? I am suspecting it is related to this change but not entirely sure.
The full stack trace looks like:
Job aborted due to stage failure: Authorized committer (attemptNumber=0, stage=0, partition=11)
failed; but task commit success, data duplication may happen.
reason=ExecutorLostFailure(2,false,Some(Container container_1715218448129_414335_01_000005
on host: my-yarn-nm-server-1.com was preempted.))
What I tried out?
I enabled the magic committer in hopes that it is more reliable and fast, but I still see this failure. Any leads to solve this would be highly appreciated.