I have a query in Snowflake that produces json output in rows. 29,000+ rows. I am using this query to create records in an MDM system. However, this ingestion process is a bit inefficient and costly. I have since taken my final SELECT statement that created the json and placed in a CTE. Then I made a new final SELECT statement
SELECT ARRAY_AGG(*) as clin_prov
FROM clin_prov
This overwhelmed the system in its size. Using LIMIT in the CTE of 1000, and even 5000, produced results, making a large json record, as expected, and desired. I’d like to have 30 rows of arrays with <= 1000 json records in the final output. Is there a way to do that? Can I create Arrays based on LIMIT and OFFSET and feed into separate rows? I
I tried with and without LIMIT in the clin_prov cte. I tried to use separate ctes based on LIMIT and OFFSET then UNION ALL. That overloaded as well.
Yes you can batch. Try the below, changing the NTILE
value for the number of batches.
WITH clin_prov AS (
SELECT
...,
NTILE(50) OVER (ORDER BY some_column) AS batch
FROM
...
)
SELECT batch, ARRAY_AGG(your_json_column) AS clin_prov
FROM clin_prov
GROUP BY batch
ORDER BY batch
keithwalsh is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.