when i am tried for each activity inside we have data flow activity we unable to extract all file in
source transformation and count of each file using aggregate function
filname count
file1 10
file2 20
file3 11
file4 88
Source Transformation (Load Data Files):
Use a wildcard to load multiple data files.
Capture the file name and count the number of records for each file.
Join Transformation:
Join the data from the control file with the actual data files based on the file name.
Derived Column Transformation:
Compare the actual record count with the expected record count and flag mismatches.
Sink Transformation (Optional):
You can write the results to a sink, log the validation results, or trigger an alert if mismatches are found.
monica thopte is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
To achieve your requirement, make sure all of your source files must be of same type.
In the source dataset, give the path till your root folder and give the remaining path in the wild card file path like below.
Source:
After the source transformation, the filename
column will have values like /file1.csv
in it. Remove or replace it with ''
using a derived column with below expression.
replace(filename,'/','')
Now, use aggregate transformation with group by on filename
and create a column count
in the aggregate section with expression count(filename)
.
Now, use join this transformation and another source of control file using inner join transformation on filename
column.
This will join both transformation and will give expected results.
You can add another derived column for column flag
with expression equals(count,row_count)
to compare these values.
Now, you can write these directly to your sink or have a conditional split transformation in between and split the required data from mismatched data as per your requirement.