I saved a Python Pandas DataFrame
of size (30M x 25) as an Apache Arrow table. Then, I’m reading that table in Julia as:
input_arrow = Arrow.Table("path/to/table.arrow")
My question is how can I iterate over the rows of input_arrow
in an efficient way.
If I just do:
for c in input_arrow:
# Do something
then, I would be iterating over the columns, but I need to iterate over the rows.
Something else that I’ve tried is converting the Arrow.Table
into a DataFrames.DataFrame
:
df = DataFrames.DataFrame(input_arrow)
for row in eachrow(df)
# do something
But this method is very slow. It reminds me of how slow it is to do df.iterrows()
in Python.
So, which is the fast way (similar to df.itertuples()
) to iterate over an Arrow.Table in Julia?