Making some progress on a proof of concept for a python dbt model in GCP (BigQuery). Built a dataproc cluster for Spark and able to execute the model, but I’m getting an error in the model that requires a configuration change for Spark. Specifically, I need to set the following:
“spark.sql.legacy.parquet.int96RebaseModeInRead”: “CORRECTED”
“spark.sql.legacy.parquet.int96RebaseModeInWrite”: “CORRECTED” “spark.sql.legacy.parquet.datetimeRebaseModeInRead”: “CORRECTED” “spark.sql.legacy.parquet.datetimeRebaseModeInWrite”: “CORRECTED”
I’m uncertain where/how to set these spark configuration options. Is it in the profiles.yml file or can I do it programatically in the python model itself…or somewhere else?
Tried setting the following in the profiles.yml file:
analytics_profile:
outputs:
dev:
server_side_parameters:
“spark.sql.legacy.parquet.int96RebaseModeInRead”: “CORRECTED”
“spark.sql.legacy.parquet.int96RebaseModeInWrite”: “CORRECTED”
“spark.sql.legacy.parquet.datetimeRebaseModeInRead”: “CORRECTED”
“spark.sql.legacy.parquet.datetimeRebaseModeInWrite”: “CORRECTED”
Bo Bucklen is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.