I am trying to use delta lake merge to replay db transactions (tagged as inserts, updates, deletes)
I’m splitting the inserts out from the mods (del/updt) and running the inserts first. So when the mods happen the record is guaranteed to already be in the delta table.
I’ve sorted the df accordingly however I can’t seem to find it in documentation anywhere that if we do have a single batch of mods that contains >1 (update or delete) affecting the same record that it will happen in the correct order. Obviously we want to end up with the most recent version in the delta table. I’m assuming this is the case however not sure and I cannot find any reference. I’m weary since enforcing order like this often cramps the style of distributed systems.. so I wouldn’t be surprised if it’s in fact NOT guaranteed.