I am developing an ETL Framework that will extract data from an Azure SQL Database in to Blob Storage using SQL Change Tracking. I store the Last Changed Track ID per table in an “ETL Helper” table that is read on each load to grab the Last Change Tracking ID. When the helper table is set up for the first time, all these ID’s start at 0 per table.
As part of my Testing, I “upserted” 2,500 rows into one particular table which meant the Tracking ID for SQL moved on to 7. When I executed The Data Factory Pipeline, it performed a full extract of 1.5 million rows for the table rather than just the latest adjustments.
My harness SQL created based on what Data Factory executes:
DECLARE @LastCTId INT = 0
, @CurrentCTId INT;
SET @CurrentCTId = CHANGE_TRACKING_CURRENT_VERSION(); -- at the time of execution, this was 7
SELECT C.*
, ct.commit_time AS last_change_datetime
FROM [dbo].TestTable C
JOIN CHANGETABLE(CHANGES [dbo].TestTable, @LastCTId) T ON C.Id = T.Id
JOIN sys.dm_tran_commit_table CT ON T.SYS_CHANGE_VERSION = CT.commit_ts
WHERE (
SELECT MAX(v)
FROM (VALUES(T.SYS_CHANGE_VERSION), (T.SYS_CHANGE_CREATION_VERSION)) AS VALUE(v)
) <= @CurrentCTId;
Is there anything obvious that I am doing wrong?