I am trying to process a CSV file in which one column contains JSON formatted data. In this JSON data, some fields use backslashes, which creates a problem when mapping fields with their values.
Link to download CSV file : https://drive.google.com/file/d/12ei9KzzIDwUiMcNS_O57URAudsolPTuH/view?usp=sharing
$file_name = 'new_testing.csv';
$file_path = '/var/www/html/'.$file_name;
$file_content = file_get_contents($file_path);
$file = new SplFileObject($file_path);
$file->setFlags(SplFileObject::READ_CSV | SplFileObject::READ_AHEAD | SplFileObject::SKIP_EMPTY);
// Read and skip the headers
$headers = $file->fgetcsv();
// Process the data
$chunkSize = 5000; // Set the desired chunk size
$loop = 0;
$chunkData = [];
while (!$file->eof()) {
// Read the next chunk of lines
for ($i = 0; $i < $chunkSize && !$file->eof(); $i++) {
$line = $file->fgetcsv();
if ($line) {
// Sanitize each field in the line to remove backslashes and double quotes
$line = array_map(function ($field) {
return str_replace(['\', '"'], '', $field);
}, $line);
// Adjust the number of elements in $line to match the number of headers
if (count($line) < count($headers)) {
// Fill missing values with null or empty strings
$line = array_pad($line, count($headers), null);
} elseif (count($line) > count($headers)) {
// Trim extra values
$line = array_slice($line, 0, count($headers));
}
// Combine headers and line values into an associative array
$row = array_combine($headers, $line);
$chunkData[] = $row;
}
}
}
echo "<pre>";
print_r($chunkData);
For the first two rows, I am getting the correct data, but starting from the third row, the “Event Value” column contains JSON data with backslashes, which creates a problem for mapping the keys and values.
3
PHP supports (as a default!) a non-standard CSV format that uses backslashes as escape characters. You have to explicitly disable this behaviour using an empty $escape
parameter to fgetcsv
, as described by the docs:
An empty string (
""
) disables the proprietary escape mechanism.
If you use this, your code will give you the JSON field correctly, with no extra sanitisation:
$file->fgetcsv(',', '"', '')