One of my spark job failed due emr-spark-shuffle-fetchfailedexception-with-65tb-data-with-aqe-enabled has high Shuffle Read Fetch Wait Time. is there any way it can be improved.
Spark-submit
spark-submit --conf spark.sql.files.maxPartitionBytes=268435456
--master yarn --deploy-mode cluster --conf spark.yarn.maxAppAttempts=1
--conf spark.sql.adaptive.enabled=true --conf spark.dynamicAllocation.enabled=false
--conf spark.sql.parquet.filterPushdown=true
--conf spark.sql.adaptive.coalescePartitions.enabled=true
--conf spark.sql.adaptive.advisoryPartitionSizeInBytes=268435456
--conf spark.sql.adaptive.optimizeSkewsInRebalancePartitions.enabled=true
--conf spark.sql.adaptive.rebalancePartitionsSmallPartitionFactor=.5
--conf spark.sql.adaptive.coalescePartitions.parallelismFirst=false
--conf spark.sql.adaptive.coalescePartitions.initialPartitionNum=36000
--conf spark.sql.adaptive.localShuffleReader.enabled=true
--conf spark.network.timeout=6000s --conf spark.files.fetchTimeout=600s
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.kryoserializer.buffer.max=1g
--conf spark.memory.storageFraction=0.05 --conf spark.memory.fraction=.8
--conf spark.shuffle.compress=true --conf spark.shuffle.spill.compress=true
--conf spark.hadoop.fs.s3.multipart.th.fraction.parts.completed=0.99
--conf spark.sql.objectHashAggregate.sortBased.fallbackThreshold=4000000
--conf spark.reducer.maxReqsInFlight=1
--conf spark.executor.cores=5
--conf spark.executor.instances=3600
--conf spark.sql.shuffle.partitions=36000
--conf spark.executor.memory=32g --conf spark.driver.memory=60g --conf spark.executor.memoryOverhead=8g --conf spark.driver.memoryOverhead=4g
--conf spark.hadoop.fs.s3a.fast.output.enabled=true
--conf spark.executor.extraJavaOptions="-XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p' -Djavax.net.ssl.trustStore=/home/hadoop/.config/certs/InternalAndExternalTrustStore.jks" --conf spark.driver.extraJavaOptions="-XX:+UseG1GC "
test.py