Today I had a technical interview for a Data Engineer position and they asked me on what is an incremental loading (in SQL por example) and how to do it, with some examples.
I said that is a way to store data in our target repository/table/dataset from the source data when we want to store only the new or updated data, rather than loading the entire dataset or have duplicates.
I explained two similar ways: checking a new column (in the target) added named “Last timestamp” with the last updated date in the target, comparing to the “date” column of the source data. The second way is comparing the typical “ID” (an automatic incremental one).
You can do a MERGE WHEN NOT MATCHED –> INSERT new data
WHEN MATCHED –> UPDATE a few columns
I could say something more? I want to know if it’s quite good like this or something could be said just in case in another time they ask me again about incremental load.
Other approaches and more and better explanation to talk about incremental load.
Iván Ramos is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.