Database: Snowflake
DBT to generate the SQL model.
I have a column in a table that contains the list of product ids owned by user.
Table: users
user_id | name |
---|---|
u1 | user 1 |
u2 | user 2 |
Table: products
product_id | product_name |
---|---|
1 | A |
2 | B |
3 | C |
Table: user_attributes
date | user_id | product_list |
---|---|---|
2024-12-01 | u1 | 1,2,3 |
2024-12-01 | u2 | 2,3,4 |
2024-12-02 | u1 | 2,3 |
2024-12-02 | u2 | 2,3,4 |
Now, I want to return the product names owned by user.
Expected output:
date | user_id | product_names |
---|---|---|
2024-12-01 | u1 | A,B,C |
2024-12-01 | u2 | B,C,D |
I tried this but getting an error
Unsupported subquery type cannot be evaluated
select
*, product_list,
(select listagg(product_name, ',')
from products
where product_id in (select value
from table(split_to_table(product_list, ',')))) as product_names
from
users_attributes ua
left join
users u on ua.user_id = u.user_id
When I tried by hardcoding the product_list
in subquery, it works executes but of course that is not what I want.
select
*, product_list,
(select listagg(product_name, ',')
from products
where product_id in (select value
from table(split_to_table('1,2', ',')))) as product_names
from
users u
left join
user_attributes ua on u.user_id = ua.user_id
3
LATERAL split_to_table
to split the product lists into rows.- Then we get a distinct list of product ids based on date and user_id
- And then those are joined iwth product table to get product names.
LISTAGG
will then convert the rows to comma separated values.
Query
WITH expanded_products AS (
SELECT
ua.date,
ua.user_id,
TO_NUMBER(split_value.value) AS product_id
FROM
user_attributes ua,
LATERAL split_to_table(ua.product_list, ',') AS split_value
)
SELECT
ep.date,
ep.user_id,
LISTAGG(p.product_name, ',') WITHIN GROUP (ORDER BY p.product_name) AS product_names
FROM
(SELECT DISTINCT date, user_id, product_id FROM expanded_products) ep
JOIN
products p ON p.product_id = ep.product_id
GROUP BY
ep.date, ep.user_id
ORDER BY
ep.date, ep.user_id;
Output
I tried this out with a CTE and found a way to get you the result you needed. Basically, I unzipped that product list column, joined the attributes to the products and users, then zipped it back up again.
WITH split_attributes as (
SELECT date,
user_id,
value as product_id
FROM user_attributes, LATERAL SPLIT_TO_TABLE(product_list, ','))
SELECT
users.user_id,
users.name,
split_attributes.date,
LISTAGG(products.product_name,',') AS product_names
FROM users
LEFT JOIN split_attributes
ON split_attributes.user_id = users.user_id
LEFT JOIN products
ON split_attributes.product_id = products.product_id
GROUP BY split_attributes.date, users.user_id, users.user_id;
A slightly improved and readable approach similar to using UDF. Product ID enriched using the UDF. As the comments mentioned, storing a list of items in an array is better than a comma-separated string.
CREATE OR REPLACE FUNCTION product_name_mapping(product_list string)
RETURNS STRING
AS
$$
SELECT
LISTAGG(product_name, ',')
FROM products
WHERE ARRAY_CONTAINS(CAST(product_id AS STRING)::variant, STRTOK_TO_ARRAY(product_list, ','))
$$
;
select
date,
user_id,
product_name_mapping(product_list) as product_names
from user_attributes ua;