Relative Content

Tag Archive for apache-sparkpysparkapache-spark-sqlparquetazure-synapse

Parquet partition performance with where clause

I’m trying to optimize query performance for a PySpark SQL query of parquet files in Azure Synapse Analytics. My data set is billions of records, so any bit of performance I can get is great. My basic question is does the columnar storage of parquet really help me with my where clause for Year, or must I use the /Year=2023 with the OPENROWSET method to get that real performance boost?