Relative Content

Tag Archive for apache-sparkpysparkamazon-emr

EMR-Spark Job creating max 1000 partitions/task when AQE is enabled

I see always 1000 task/partitions getting created for a spark jobs with AQE enabled. If I execute job for monthly(4 times weekly data) or a week data, the shuffle partitions are same.Whis is nothing but number of task running is 1000. Hence it’s throwing memory issues. Is there any parameter enables max 1000 partitions.

Pyspark job restart from within script running on EMR

I have a pyspark job which runs in the EMR cluster. Is there any way that from the script itself I can fail the job and then restart it on certain condition? Currently I am throwing an exception but that just fails the job and stops. I want it to start automatically again.