I have some not-quite-columnar data like this:
"hello", "2024 JAN", "2024 FEB"
"a", 0, 1
If it were truly columnar, it would look like:
"hello", "year", "month", "value"
"a", 2024, "JAN", 0
"a", 2024, "FEB", 1
Suppose the data is in the form of a numpy array, like this:
import numpy as np
data = np.array([["hello", "2024 JAN", "2024 FEB"], ["a", "0", "1"]], dtype="<U")
data
array([['hello', '2024 JAN', '2024 FEB'],
['a', '0', '1']], dtype='<U8')
Imagine also that I created a table:
import duckdb as ddb
conn = ddb.connect("hello.db")
conn.execute("CREATE TABLE columnar (hello VARCHAR, year UINTEGER, month VARCHAR, value UINTEGER);")
How could I go about efficiently inserting data
into the DuckDB table columnar
?
The naive/easy way would be to transform the data into a columnar format in-memory, in Python, before inserting it into the DuckDB table. This will be slow though, if I have a lot of data…