I have added the spark.sql.adaptive.advisoryPartitionSizeInBytes=268435456 and spark.sql.adaptive.enabled=true. However my data size for each partition is more than 256 mb. I see the Dag where the AQEShuffleRead creates less partition then spark.sql.adaptive.coalescePartitions.initialPartitionNum.
Question: I think enabling AQE will reduce the partition when ever there smaller partitions size.Here the data size per partition is quite high. Can any one throw some light on this.
Spark-Submit commnd:
spark-submit --conf spark.sql.files.maxPartitionBytes=268435456
--master yarn --deploy-mode cluster --conf spark.yarn.maxAppAttempts=1
--conf spark.sql.hive.metastorePartitionPruning=false
--conf spark.sql.adaptive.enabled=true --conf spark.dynamicAllocation.enabled=false
--conf spark.sql.parquet.filterPushdown=true
--conf spark.sql.adaptive.coalescePartitions.enabled=true
--conf spark.sql.adaptive.advisoryPartitionSizeInBytes=268435456
--conf spark.sql.adaptive.optimizeSkewsInRebalancePartitions.enabled=true
--conf spark.sql.adaptive.rebalancePartitionsSmallPartitionFactor=.5
--conf spark.sql.adaptive.coalescePartitions.parallelismFirst=false
--conf spark.sql.adaptive.coalescePartitions.initialPartitionNum=30000