I have created a parquet file using Python polars’ .write_parquet
method. It can be read back by Python without a problem and MATLAB can also read the information about the file using parquetinfo
without a problem.
However, when I run parquetread
in MATLAB to actually load the data, it fails quickly with the error “Unable to read Parquet file” without further details.
How can I create a parquetfile using Python that is readable by MATLAB?
It turns out it the compression used by the parquetfile was not compatible with MATLAB 2024a.
In my Python code I wrote:
df.write_parquet("./file.parquet", compression="lz4")
I chose that compression as it was faster according to the docs. After reading on, I found that the docs of the compression
parameter also state (emphasis mine):
Choose “zstd” for good compression performance. Choose “lz4” for fast compression/decompression. Choose “snappy” for more backwards compatibility guarantees when you deal with older parquet readers.
After setting the compression option to “snappy”, the resulting file could be read by MATLAB. So the line of Python code becomes:
df.write_parquet("./file.parquet", compression="snappy")