I was trying to suppress the spark logging and specifying my own log4j.properties
file.
gcloud dataproc jobs submit spark
--cluster test-dataproc-cluster
--region europe-north1
--files gs://test-spark-logging-bucket/log4j.properties
--properties spark.sql.legacy.allowUntypedScalaUDF=true,'spark.driver.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties,spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties'
--class com.pythian.edp.pm.spark.job.TestSparkJob
--jars gs://18621ad39476e29c-test-static/spark-jobs/sparkBigQueryConnector/spark-bigquery-assembly-0.11.1-beta-SNAPSHOT.jar,gs://18621ad39476e29c-test-static/spark-jobs/digiSparkPmCmProcessing/digiSparkPmCmProcessing-assembly-0.1.0-SNAPSHOT.jar
-- --configurationURI gs://18621ad39476e29c-test-static/spark-job-configs/f9fabc4f-162e-40f3-af69-237e4c464c9e-PM-LTE-CELLCQI/ing-15min-cell-1719203688539.yml
--jobType "ingestion" --dataType "PM"
--sendPubSubNotification --pubSubProjectId bmas-eu-digi-pipe-uat --pubSubTopicName data-notifications
--traceDatasets > test_log_ingestion_job_log 2>&1
above command is working fine but I am struggling to set it via python code.
below is the snippet of the code I am working with.
job_properties.update(
{"spark.dynamicAllocation.enabled":"true",
"spark.dynamicAllocation.minExecutors" : "0",
"spark.dynamicAllocation.maxExecutors" : "5",
"spark.executor.instances": "0",
"spark.sql.legacy.allowUntypedScalaUDF":"true",
"spark.executor.extraJavaOptions":"-Dlog4j.configuration=file:log4j.properties",
"spark.sql.legacy.allowUntypedScalaUDF":"-Dlog4j.configuration=file:log4j.properties"
})
# log_file_location='gs://test-spark-logging-bucket/log4j.properties'
job_details = {
'placement': {
'cluster_name': cluster_name,
},
'reference': {
'job_id': job_id,
},
'scheduling': {
'max_failures_per_hour': 1,
},
'labels': labels,
'spark_job': {
'args': spark_job_arguments,
'main_class': SPARK_JOB_CLASSNAME,
'jar_file_uris': [
os.path.join(self.dataproc_job_jar_file_prefix, file) for file in jar_files
],
'properties': job_properties,
}
}