Relative Content

Tag Archive for apache-sparkamazon-emraws-auto-scaling

Why are identical spark jobs taking longer to execute on emr cluster if they are submitted later?

I have an emr cluster that I am submitting 50 jobs to at the same time (about 3 minutes between the first submission and the last submission). I want all the jobs to run in parallel, and I should see that all the jobs take about the same amount of time to complete. I am seeing that the first 20 jobs take about 2 and a half minutes to run. The last 30 jobs take anywhere from 6 minutes to 10 minutes to run. They are all spark submits with the following configuration