How to improve a join between two huge tables, one tables contains 4 billion records and another contains 2 billion records in snowflake on L size warehouse. These join occurs when thoughtspot triggers a query on snowflake. Currently it is taking 4 mins. I want to bring it down to less than 10 secs. I tried doing clustering, QAS etc. The join is inner join and relation ship is one to many.
I tried Query acceleration service, clustering, increased warehouse size.
Please find he below query
SELECT
DATE_TRUNC('MONTH', "ta_1"."DATE") "ca_1",
"ta_2"."FEATURELEVEL1" "ca_2",
count(DISTINCT "ta_3"."EVENTUUID") "ca_3"
FROM "C_GOLD"."PRODUCT_USAGE"."FEATURESBRIDGE" "MTA_0"
JOIN "C_GOLD"."PRODUCT_USAGE"."MAPPEDFEATURES" "ta_2"
ON "MTA_0"."FEATUREID" = "ta_2"."FEATUREID"
JOIN "C_GOLD"."PRODUCT_USAGE"."PRODUCTEVENTS" "ta_3"
ON (
"MTA_0"."APPSTRINGID" = "ta_3"."APPSTRINGID"
AND "MTA_0"."EVENTUUID" = "ta_3"."EVENTUUID"
AND "MTA_0"."DATEKEY" = "ta_3"."DATEKEY"
)
JOIN "C_GOLD"."MASTER_DATA"."DATEDIM" "ta_1"
ON "ta_3"."DATEKEY" = "ta_1"."DATEKEY"
JOIN "C_GOLD"."MASTER_DATA"."COMPANIES" "ta_5"
ON "ta_3"."COMPANYID" = "ta_5"."COMPANYID"
JOIN "C_GOLD"."MASTER_DATA"."PRODUCTTAGS" "MTA_1"
ON "ta_3"."PRODUCTTAGID" = "MTA_1"."PRODUCTTAGID"
LEFT OUTER JOIN "C_GOLD"."MASTER_DATA"."PARENTVIEW" "ta_4"
ON "MTA_1"."PARENTTAGID" = "ta_4"."PARENTTAGID"
WHERE (
"ta_1"."DATE" >= DATE('2023-08-01', 'YYYY-MM-DD')
AND "ta_1"."DATE" < DATE('2024-08-01', 'YYYY-MM-DD')
AND LOWER("ta_4"."PRODUCTTAGNAME") = 'smart 3d'
AND "ta_5"."ISINTERNAL" = FALSE
)
GROUP BY
"ca_1",
"ca_2"
ORDER BY "ca_1" ASC NULLS LAST
;
The Huge tables are Productevents and FeatureBridge.
6