I am trying to figure out how BigQuery Continuous Queries work. I’m interested in whether they can be used for BigQuery outbound CDC. I need to capture and respond to data change events in a BigQuery table.
I created the following BigQuery Continuous Query:
INSERT INTO `myproject.test.dst_table`
SELECT
id,
name,
age
FROM `myproject.test.src_table`;
Right after starting this query, the data from the src_table
was successfully copied to the dst_table
. So far so good.
But if, for example, I update a row in the src_table
UPDATE `test.src_table`
SET Name = 'Alex'
WHERE Id = 2;
immediately after that, the number of records in the dst_table
almost doubles. A new row with the update information arrives, along with many other rows that I didn’t update. Is this the correct behavior of BigQuery Continuous Queries, and if so, how can they be configured properly to only send the information that I updated without including the rest of the rows that weren’t affected at all during the UPDATE?
Based on this answer /a/78572235/1219755, it seems that BigQuery Continuous Queries can be used for BigQuery outbound CDC, but how should they be configured correctly for this purpose?
Expected behavior: after starting a BigQuery Continuous Query, all existing data is replicated from the src
table to the dst
table, and then only new incoming changes are replicated to the dst
table.
I also just closed the tab for my active Continuous Query in the GCP Web Console for BigQuery, and now I can’t find it anywhere to, for example, stop it. Where can I find it?
1