I ran a simple query over 21 Gb of data on an XL warehouse for 90 minutes before canceling it. After the first ten minutes, nothing changed significantly in the query profile, other than the ‘Bytes sent over the network’ metric, which had increased steadily the entire time and had surpassed 1000 Gb at the time that I cancelled the query. What could cause that much data to be transferred over the network, and what could I do to improve the performance of this query?
Query:
select
c.client_id,
c.client_created_date,
min(cd.status_date) as first_trial_date
from clients_d as c
left join clients_daily_a as cd
on c.client_id = c.client_id
and cd.started_trial
group by 1, 2
The clients_d table is less than 1 Gb, 4,000,000 rows, and is unique on client_id. The clients_daily_a table is 21 Gb, 600,000,000 rows, and is unique on client_id and status_date. started_trial is a boolean field.
Query Profile: