I see the number of task in spark job is only 1000 after initial read, where as number of cores available is 9000 (1800 executors*5 core each). I have enabled aqe and coalesce shuffle partition. In the pic below you can see there is only 1000 task runnings. However the input task with 256 mb split is around 141000. The code is in pyspark (sql aggregated function on top of s3 and writing back to s3)
My question:
1. Is 1000 task defined by spark as part of aqe after shuffle?
2. If Cluster is going to use 1000 task aka 1000 cores max, is it ideal to reduce the number of executors. [ Note I have 150 r5.16xlarge.64 vCore, 512 GiB memory]
spark-submit --conf spark.sql.files.maxPartitionBytes=268435456
--master yarn --deploy-mode cluster --conf spark.yarn.maxAppAttempts=1
--conf spark.sql.adaptive.enabled=true --conf spark.dynamicAllocation.enabled=false
--conf spark.sql.parquet.filterPushdown=true
--conf spark.sql.adaptive.coalescePartitions.enabled=true
--conf spark.sql.adaptive.advisoryPartitionSizeInBytes=256m
--conf spark.sql.adaptive.coalescePartitions.minPartitionSize=256m
--conf spark.sql.adaptive.coalescePartitions.parallelismFirst=false
--conf spark.sql.adaptive.coalescePartitions.minPartitionNum=27000
--conf spark.network.timeout=5400s --conf spark.files.fetchTimeout=600s
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.kryoserializer.buffer.max=1g
--conf spark.memory.storageFraction=0.05 --conf spark.memory.fraction=.8
--conf spark.shuffle.compress=true --conf spark.shuffle.spill.compress=true
--conf spark.hadoop.fs.s3.multipart.th.fraction.parts.completed=0.99
--conf spark.sql.objectHashAggregate.sortBased.fallbackThreshold=4000000
--conf spark.reducer.maxReqsInFlight=1
--conf spark.network.timeout=1200s
--conf spark.executor.cores=5
--conf spark.executor.instances=1800
--conf spark.executor.memory=32g --conf spark.driver.memory=60g --conf spark.executor.memoryOverhead=4g --conf spark.driver.memoryOverhead=4g