Say I have
In [1]: import pyarrow as ap
In [2]: import pyarrow as pa
In [3]: ca = pa.chunked_array([[1,2,3], [4,5,6]])
In [4]: ca
Out[4]:
<pyarrow.lib.ChunkedArray object at 0x7f7afaaa4a90>
[
[
1,
2,
3
],
[
4,
5,
6
]
]
I’d like to do something like
ca[[1, 4]] = [999, 888]
and end up with
<pyarrow.lib.ChunkedArray object at 0x7f7aef473d30>
[
[
1,
999,
3
],
[
4,
888,
6
]
]
I don’t really mind if the result gets rechunked
PyArrow has the functionality to set values based on a mask (in fact replace values instead of set, given that the pyarrow Array is indeed immutable), but not with indices. However, you convert your indices to a mask, for now, and then use pyarrow.compute.replace_with_mask
:
import numpy as np
import pyarrow as pa
import pyarrow.compute as pc
ca = pa.chunked_array([[1,2,3], [4,5,6]])
indices = [1, 4]
mask = np.zeros(len(ca), dtype=bool)
mask[indices] = True
pc.replace_with_mask(ca, mask, [999, 888])
gives
<pyarrow.lib.ChunkedArray object at 0x7fdfea44ddf0>
[
[
1,
999,
3
],
[
4,
888,
6
]
]
This assumes that the indices are sorted, though (see the comments for a workaround when that is not the case).
And see https://github.com/apache/arrow/issues/25505 for a feature request to add a function to replace values directly based on indices.
2
Arrow data is immutable, so values can be selected but not assigned.
https://arrow.apache.org/docs/python/data.html#arrays
So you would have to do something like this:
ca2 = pa.chunked_array([ca[:1],[999], ca[2:]])
1