Relative Content

Tag Archive for pyarrow

AttributeError: module ‘pyarrow.lib’ has no attribute ‘ListViewType’

AttributeError Traceback (most recent call last) Cell In[67], line 2 1 import pandas as pd —-> 2 import pyarrow.parquet as pq 3 import pyarrow.lib as _lib 4 import pyarrow as pa File /opt/anaconda3/lib/python3.11/site-packages/pyarrow/parquet/init.py:20 1 # Licensed to the Apache Software Foundation (ASF) under one 2 # or more contributor license agreements. See the NOTICE file […]

Converting pyarrow Table to RecordBatches of a fixed byte size

I appreciate this might be impossible in the most general, case but is there a way to convert a pyarrow table into record batches of a fixed maximum byte size? I know I can request a maximum row size, so this should be possible if I can find the maximum row size for a given schema. I didn’t see anything in the docs of either pa.Schema or pa.Field about their sizes?

Presenting Python enums in PyArrow and Parquet files

Does PyArrow (Arrow) and its Parquet file format have a native way to serialise Python enums? If not what would be the storage-efficient way to encode enum strings, so that preferably other Parquet readers could also understand them?