using Python 3.10 I’m trying to read some data into a DuckDB (v1.0.0) table, delete some rows and then cast columns to a different data type. The problem is that the rows that were deleted contained values that cannot be cast to the new type, however as these rows were deleted I do not expect this to be a problem.
Here is a minimal example:
Test data file:
Col1,Col2,Col3,Col4
0,a,0,b
0,c,0,d
1,e,0,f
1,g,0,h
1,i,0,j
Junk,,,
2,k,0,l
2,m,z,n
Junk,,,
3,o,0,p
Junk,,,
In Python I run
import duckdb
con = duckdb.connect()
con.execute(""" CREATE TABLE test AS SELECT * FROM read_csv("test.csv") """)
con.execute(""" DELETE FROM test WHERE Col1 LIKE 'Junk' """)
con.execute(""" ALTER TABLE test ALTER Col1 SET DATA TYPE INTEGER """)
The last row gives me the following exception
ConversionException: Conversion Error: Could not convert string 'Junk' to INT32
However after the DELETE, printing the table as well as a SELECT WHERE filtering it for ‘Junk’ rows yields no result with Junk rows. I’d assumed that these rows are indeed deleted. When I copy the contents to a new table e.g. by
con.execute(""" CREATE TABLE test2 AS SELECT * FROM test """)
the ALTER works fine on this new table.
I don’t really understand what is going on here (or rather why the delete is rather a hide than a delete) and I also didn’t find anything useful in the DuckDB issues, doc, DuckDB in Action-book or SO. Apart from people recommending a VACUUM which does not do anything (tested and also according to the DuckDB manual).
Another option I see is to provide a conversion function for the ALTER that returns some useful value or NULL for non-numeric strings but I’d rather understand what the problem is. Do I always have to copy the entire table in situations like these?
Thanks in advance,
PJJP
PJJP is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.