I have a spark job where I passed all the conf parameter as form of spark-submit where as in other job (same code and same data) where I created spark session and added same config . However the the job execution time is quite different. Does it really matter in peroformance?
spark-submit --conf spark.sql.files.maxPartitionBytes=268435456
--master yarn --deploy-mode cluster --conf spark.yarn.maxAppAttempts=1
--conf spark.sql.hive.metastorePartitionPruning=false
--conf spark.sql.adaptive.enabled=true --conf spark.dynamicAllocation.enabled=false
--class com.lslsamamance s3://seaembly-1.0.jar --ren_id 1
Here I created spark session.
val spark: SparkSession = SparkSession.builder().appName(appName).enableHiveSupport().config("spark.sql.sources.partitionOverwriteMode", "dynamic").config("spark.sql.hive.metastorePartitionPruning", "false").config("spark.sql.adaptive.enabled", "true").config("spark.dynamicAllocation.enabled", "false").config("spark.sql.parquet.filterPushdown", "true")