I’m working with a Snowflake table where each row contains structured data in two columns. Here’s an example of the table with 3 rows:
| id | data_1 | data_2 |
|----------------------------------------------|
| 1 | 2024-01-01, 1 | 2023-02-01, 2 |
| | 2024-01-02, 2 | 2023-02-02, 2 |
|----------------------------------------------|
| 2 | 2024-02-03, 3 | 2023-02-01, 20 |
| | 2024-02-02, 4 | 2023-02-02, 20 |
|----------------------------------------------|
| 3 | 2024-03-01, 5 | 2023-02-01, 30 |
| | 2024-03-02, 6 | 2023-02-02, 30 |
|----------------------------------------------|
Conceptually, in each row, columns data_1
and data_2
contain an array of structured data where each element has a date and an integer value associated with it.
I want to query this table and obtain the following results:
- For
data_1
: the value associated with the latest date. - For
data_2
: the sum of all values.
Desired result of the query:
| id | data_1 | data_2 |
|----------------------|
| 1 | 2 | 4 |
|----------------------|
| 2 | 3 | 40 |
|----------------------|
| 3 | 6 | 60 |
|----------------------|
Questions:
- Which Snowflake data types are best suited for
data_1
anddata_2
? Ideally, I prefer a data type where I can address each sub-element using a name (something akin todata_1.date
anddata_1.value
, so perhaps astructured object
or astructured map
). - How do you write the query to achieve the desired result?
Thank you!