Below is the Json object which I have pushed from cosmos to data lake through azure synapse dataflow pipeline. Now if any changes in below data it must overwrite the data in the existing csv file which is present in data lake and if there is no transactional file , it should create new transaction file and continue to append the changes.
I wanted to create pipeline to address above query as I am new boy to this azure cloud platform
Sample Json-
{
"timeFrames": [
{
"yearGroup": "Reception",
"term": "Summer",
"scoreIndeces": {
"66": 0.2,
"67": 0.3
}
},
{
"yearGroup": "Year 1",
"term": "Spring",
"scoreIndeces": {
"64": 0.6,
"65": 0.6
}
}
],
"tableType": 1,
"id": "f82",
"productId": "4b",
"productName": "New PUMA"
}
Transactions
Main
Balaji Dev is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
8
To achieve your requirement, you need to first copy the data from main file to Transection file with adding additional column to it and custom value as inserted so we get the initial transections.
Then using data flow compare these files whenever Changes happens in new file as below:
- Take dataflow activity add new dataflow to it. Create two sources first using transection file and second using main file.
- After secon source take exist transformantion and add Left stream as source 2, Right stream as source 1, Exist type as Doesn’t exist, Exists conditions as all columns from source are equal.
- This will return you a updated rows the take dericed column transformation and create column with same name as transection file
Transaction Type
and assigne Update value to it.
- Then after first source take union transformation and add derived column transformation and source 1 as incoming streams as below:
- This will return you required output and store it in transection file.
Note:- When Synapse pipeline write data to ADLS gen 2 files it will only overwrite data do not append it. so better way to handle your situation is to RDMS database like SQL