I need to replicate Cloud SQL for PostgreSQL (source database) to Cloud SQL for PostgreSQL (destination database). All changes should be applied to the destination database in batches within separate transactions, just as they were originally done in the source database.
To do this, I am considering using GCP Datastream for CDC.
In the documentation https://cloud.google.com/datastream/docs/events-and-streams, I see that each change contains a tx_id
property:
PostgreSQL tx_id string string The transaction ID to which the event belongs.
Based on this value, in my code, I can buffer changes that belong to the same transaction before applying these changes as part of a single transaction to the destination database. But how can I know that all changes within a single tx_id
have already been received from the source database? Is there any property that comes with the data from the source database indicating that these are the last changes for that transaction? How should I properly organize the logic for buffering changes before applying them to the destination database?
1