Relative Content

Tag Archive for parquetpyarrowapache-arrow

What is the difference between data_page_version=1.0 and 2.0 in parquet files?

In pyarrow, the parquet writer has the data_page_version parameter which is either “1.0” or “2.0” with the default of “1.0”. I sometimes save files with “2.0” because ‘hey higher version must be better, right?’. Other times I don’t bother setting that option so I get the default. I’ve never noticed a difference or had a problem either way in using polars, pyarrow, duckdb (occasionally), or Azure Synapse.