I have a table in postgres and basically i’m trying to duplicate rows based on the delimiter @ in the description column. Here is my table:
txn_id | description |
---|---|
3332654 | [email protected]@0.9397@$10.64@[email protected]@23.8235@$6.30@KRW@36,[email protected]@$41.84@[email protected]@1.5711@$12.73@[email protected]@0.6013@$8.32@[email protected]@10.6013@$18.32 |
3332655 | [email protected]@0.8197@$11.64@[email protected]@21.8135@$61.30@KRW@36,[email protected]@$411.84@[email protected]@11.5711@$11.73 |
Using the below postgresql code, the rows are duplicated based on a set of four values:
select txn_id
,max(case when nr%4=1 then elem end) cncy_cd
,max(case when nr%4=2 then elem end) txn_cncy_amt
,max(case when nr%4=3 then elem end) txn_exch_rate
,max(case when nr%4=0 then elem end) txn_nzd_amt
from test t
left join lateral (select elem,row_number()over() nr
from unnest(string_to_array(t.description, '@')) AS a(elem)) ON true
group by txn_id,(nr-1)/4 ;
and the output will be:
txn_id | cncy_cd | txn_cncy_amt | txn_exch_rate | txn_nzd_amt |
---|---|---|---|---|
3332654 | SGP | 15.00 | 10.6013 | $18.32 |
3332654 | EUR | 5.00 | 0.6013 | $ 8.32 |
3332654 | KRW | 36,000.00 | 860.3722 | $41.84 |
3332655 | THB | 36,001.00 | 861.3722 | $411.84 |
3332654 | FJD | 150.00 | 23.8235 | $6.30 |
3332654 | CAD | 20.00 | 1.5711 | $12.73 |
3332654 | FJD | 10.00 | 0.9397 | $10.64 |
3332655 | CAD | 21.00 | 11.5711 | $11.73 |
3332655 | THB | 11.00 | 0.8197 | $11.64 |
3332655 | SGP | 110.00 | 21.8135 | $61.30 |
Below is the demo of the postgres code: https://dbfiddle.uk/BGOeoIlM
I’m trying to achieve the same result using duckdb and convert it to a polars dataframe and below is my code:
a = duckdb.sql("create table test (txn_id int, description varchar(200));insert into test values (3332654,'[email protected]@0.9397@$10.64@[email protected]@23.8235@$6.30@KRW@36,[email protected]@$41.84@[email protected]@1.5711@$12.73@[email protected]@0.6013@$8.32@[email protected]@10.6013@$18.32'),(3332655,'[email protected]@0.8197@$11.64@[email protected]@21.8135@$61.30@KRW@36,[email protected]@$411.84@[email protected]@11.5711@$11.73');select txn_id ,max(case when nr%4=1 then elem end) cncy_cd ,max(case when nr%4=2 then elem end) txn_cncy_amt, max(case when nr%4=3 then elem end) txn_exch_rate ,max(case when nr%4=0 then elem end) txn_nzd_amt from test t left join lateral (select elem,row_number()over() nr from unnest(string_to_array(t.description, '@')) AS a(elem)) ON true group by txn_id,((nr-1)//4);")
Below is the output from duckdb and it does not match with the postgres output.
txn_id | cncy_cd | txn_cncy_amt | txn_exch_rate | txn_nzd_amt |
---|---|---|---|---|
3332654 | FJD | $18.32 | 10.6013 | 15.00 |
3332655 | THB | 110.00 | 21.8135 | $61.30 |
3332654 | SGP | $8.32 | 0.6013 | 5.00 |
3332655 | CAD | 11.00 | 0.8197 | $11.64 |
3332654 | KRW | $6.30 | 23.8235 | 150.00 |
3332655 | KRW | 36,001.00 | 861.3722 | $411.84 |
3332655 | FJD | 21.00 | 11.5711 | $11.73 |
3332654 | THB | $10.64 | 0.9397 | 10.00 |
3332654 | EUR | $12.73 | 1.5711 | 20.00 |
3332654 | CAD | $41.84 | 860.3722 | 36,000.00 |
Not sure what i’m missing here as i tried spending lot of time on this and could not figure out why it would behave differently. Great if someone could please help me with this?