I have the following dataset in Presto.
WITH A (name, distinct_values) AS (
VALUES
('color', ARRAY['red', 'yellow']),
('shape', ARRAY['triangle', 'square', 'circle']),
('size', ARRAY['big', 'small'])
)
select * from A
which looks like:
name | distinct_values |
---|---|
color | red,yellow |
shape | triangle,square,circle |
size | big,small |
I don’t know how many distinct values are in name
or distinct_values
ahead of time. What I want is a map column where each row has 3 entries, one for each possible combination of color, shape, and size.
tuples
{'color': 'red', 'shape': 'triangle', 'size': 'big'}
{'color': 'red', 'shape': 'triangle', 'size': 'small'}
{'color': 'red', 'shape': 'square', 'size': 'big'}
...
{'color': 'yellow', 'shape': 'circle', 'size': 'small'}
I can’t figure out how to do this in the case where the distinct names are not known ahead of time. Any ideas?
I don’t know how many distinct values are in
name
I want is a map column where each row has 3 entries, one for each possible combination of color, shape, and size.
Arguably those two requirements kind of contradict each other (though if you are interested only in 3 values in names
you can filter all others out).
Based on my understanding of the task – you can try to using CROSS JOIN
:
WITH A (name, distinct_values) AS (
VALUES
('color', ARRAY['red', 'yellow']),
('shape', ARRAY['triangle', 'square', 'circle']),
('size', ARRAY['big', 'small'])
),
-- query parts
b as (
select name, value
from a
, unnest(distinct_values) as t (value)
)
select cast(row(b_color.value, b_shape.value, b_size.value) as row(color varchar, shape varchar, size varchar ))
from b b_color
cross join b as b_shape
cross join b as b_size
where b_color.name = 'color' AND b_shape.name = 'shape' AND b_size.name = 'size';
Output:
_col0 |
---|
{color=red, shape=triangle, size=big} |
{color=red, shape=square, size=big} |
{color=red, shape=circle, size=big} |
{color=red, shape=triangle, size=small} |
{color=red, shape=square, size=small} |
{color=red, shape=circle, size=small} |
{color=yellow, shape=triangle, size=big} |
{color=yellow, shape=square, size=big} |
{color=yellow, shape=circle, size=big} |
{color=yellow, shape=triangle, size=small} |
{color=yellow, shape=square, size=small} |
{color=yellow, shape=circle, size=small} |
Note that this will be a quite taxing query for any input of significant size (though potentially you can optimize it a bit, for example doing filtering before doing the cartesian product)
1