I’m encountering an issue while trying to copy files from an SFTP server to Blob Storage using Azure Data Factory (ADF) Copy Activity. The specific error message I’m receiving is as follows:
ErrorCode=DelimitedTextMoreColumnsThanDefined,’Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error found when processing ‘Csv/Tsv Format Text’ source ‘CRM_XXX_###.CSV’ with row number 1140713: found more columns than expected column count 369.,Source=Microsoft.DataTransfer.Common,’
Context:
- I’m using Azure Data Factory to copy CSV files from an SFTP server to Azure Blob Storage.
- The CSV file ‘CRM_XXX-###.CSV’ is quite large and contains many rows (3.8 GB).
- The specified column count in the dataset schema is 369.
Problem:
- The error occurs at row number 1,140,713 where it seems to find more columns than expected (more than 369 columns).
- This error halts the copy activity, preventing the file from being fully processed and copied.
Steps I’ve Taken:
- Checked the dataset schema in ADF to ensure the column count is set to 369.
- Verified the CSV file format and structure to confirm it adheres to the expected column count.
- Tried to locate and inspect the problematic row (1,140,713) in the CSV file but couldn’t identify any anomalies manually as the file is too big.
Questions:
- What could be causing this discrepancy in the column count?
- Are there any best practices or tools to handle or preprocess large CSV files to avoid such issues in ADF?
- Is there a way to configure ADF to skip or log the problematic rows instead of failing the entire copy activity?
Additional Information:
- The source file is quite large (several million rows).
- Manually inspecting the entire file is not feasible due to its size.
Any insights or suggestions on how to resolve or work around this issue would be greatly appreciated!