I’m having a problem when uploading a DataFrame, coming from a pandas generator, using DuckDB. I’ve already looked for similar errors, but none of them brought me a solution in DuckDB syntax.
Context: I’m collecting data from a SQL Server 17 database, using pandas. I collect this data through chunks of 100 thousand lines, which are stored in a generator, an iterable object. I then access, via for loop, each of the DataFrames and insert this data into my PostgreSQL database.
Here’s the code:
conn=module.connect_sqlserver()
with open(os.path.join(os.getcwd(), 'query.sql'), 'r') as q:
querytxt = q.read()
df_chunks = pd.read_sql(sql=querytxt, con=conn, chunksize=100000)
for df in df_chunks:
count += 1
try:
df_duck = df
duckdb.default_connection.execute("SET GLOBAL pandas_analyze_sample=100000")
if count == 1:
duck_con.sql(f"""
{attach_statement};
DROP TABLE IF EXISTS postgres_db.{schema}.{table_name} CASCADE;
CREATE TABLE IF NOT EXISTS postgres_db.{schema}.{table_name} ({schema_txt});
INSERT INTO postgres_db.{schema}.{table_name} SELECT * FROM df_duck;
""")
elif count > 1:
duck_con.sql(f"""
{attach_statement};
INSERT INTO postgres_db.{schema}.{table_name} SELECT * FROM df_duck;
""")
Was returned, in last chunk: index 632(x) is out of bounds for axis 0 with size (x)
I tried create unique index, i tried a data sample (and in its worked, but not in all table)
Juan Mangueira is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.