Is there a way to validate Python objects against a pyarrow schema?
I have a pyarrow Schema defined and a list of native Python dictionaries. I can use
AttributeError: module ‘pyarrow.lib’ has no attribute ‘ListViewType’
AttributeError Traceback (most recent call last) Cell In[67], line 2 1 import pandas as pd —-> 2 import pyarrow.parquet as pq 3 import pyarrow.lib as _lib 4 import pyarrow as pa File /opt/anaconda3/lib/python3.11/site-packages/pyarrow/parquet/init.py:20 1 # Licensed to the Apache Software Foundation (ASF) under one 2 # or more contributor license agreements. See the NOTICE file […]
Converting pyarrow Table to RecordBatches of a fixed byte size
I appreciate this might be impossible in the most general, case but is there a way to convert a pyarrow table into record batches of a fixed maximum byte size? I know I can request a maximum row size, so this should be possible if I can find the maximum row size for a given schema. I didn’t see anything in the docs of either pa.Schema
or pa.Field
about their sizes?
Presenting Python enums in PyArrow and Parquet files
Does PyArrow (Arrow) and its Parquet file format have a native way to serialise Python enums? If not what would be the storage-efficient way to encode enum strings, so that preferably other Parquet readers could also understand them?
Does arrow file convert pandas timestamp of datetime64[ns] values to nanoseconds?
I have a pandas dataframe with a datetime column of the following values: