Say I have
import pyarrow as pa
ca = pa.chunked_array([['a', 'b', 'b', 'c']])
print(ca)
<pyarrow.lib.ChunkedArray object at 0x7fc938bcea70>
[
[
"a",
"b",
"b",
"c"
]
]
I’d like to end up with:
pyarrow.Table
_a: uint8
_b: uint8
_c: uint8
----
_a: [[1,0,0,0]]
_b: [[0,1,1,0]]
_c: [[0,0,0,1]]
How can I do this?
I’m aware that it’s possible to do this by converting to pandas, but is it possible to do it with just PyArrow (to avoid taking on an extra dependency)?