UPDATE
target_table AS tgt
SET
campaign_name = src.campaign_name,
line_of_business = src.Campaign_Line_Of_Business
FROM
(
SELECT
A.project_id ,
A.campaign_id ,
A.run_id ,
B.Campaign_Line_Of_Business,
B.campaign_name
FROM
target_table A
left JOIN src_table B
ON
(A.Project_id = B.Project_id
AND A.Campaign_id = B.Campaign_id
AND A.Run_id = B.Run_id)
GROUP BY
1,
2,
3,
4,
5
) src
WHERE
(tgt.project_id = src.project_id
and tgt.campaign_id = src.campaign_id
and tgt.run_id = src.run_id);
Projection segmentation and ordering …
ORDER BY target_table.project_id,
target_table.campaign_id,
target_table.run_id,
target_table.campaign_node_id,
target_table.locale_cd
SEGMENTED BY hash(target_table.project_id,
target_table.campaign_id,
target_table.run_id,
target_table.campaign_node_id,
target_table.message_id,
target_table.locale_cd) ALL NODES KSAFE 1;
1
I’m not completely sure – it’s a bit of a special case when you use a query consisting of a join and a group by in the FROM clause of the UPDATE statement.
But:
The driving input table – which is also the target table, is segmented by the hash of these six:
SEGMENTED BY hash(
target_table.project_id,
target_table.campaign_id,
target_table.run_id,
target_table.campaign_node_id,
target_table.message_id,
target_table.locale_cd
)
… and the joining WHERE condition consists of these 3:
tgt.project_id = src.project_id
and tgt.campaign_id = src.campaign_id
and tgt.run_id = src.run_id
If we segment by columns a,b,c,d,e,f
, and these are integers, the combination:
a | b | c | d | e | f
1 | 1 | 1 | 1 | 1 | 1
could be on node 1, and the combination:
a | b | c | d | e | f
1 | 1 | 1 | 1 | 2 | 1
, because it leads to a different hash value, could be on node 2.
So the a,b,c
combination used for the join will cause the same combination to exist on all nodes, and therefore resegmentation is indispensable to make the join possible.
Try creating a new projection segmented and ordered by project_id
, campaign_id
, and run_id
, if there are more run_id
s than campaign_id
s and more campaign_id
s than project_id
s – otherwise put the column with the lowest cardinality first in the ORDER BY
list, followed by the next higher cardinality, etc.