I’m currently building a small command-line tool with Node and node-xlsx to parse and process some Excel files, and I noticed that whenever the input contains the sequence
_X2069_
node-xlsx seems to convert it to Unicode character U+2069 (Pop Directional Isolate).
Here’s a minimal example:
const xlsx = require('node-xlsx');
function getData(excelSheet) {
data = [];
excelData = excelSheet[0]["data"]
length = 0
// Find the number of rows with actual data and filter out blank rows at the end
for (let i = 0; i < excelData.length; i++) {
if (excelData[i][0] != undefined) {
data.push([]);
data[i].push(excelData[i][0]); // only first column for now
}
}
return data;
}
const workSheet = xlsx.parse('./input.xlsx');
let rows = getData(workSheet);
console.log(rows);
And here’s my input (contents of input.xlsx):
_X2069_
_Y2069_
_Z2069_
The output is:
[ [ 'u2069' ], [ '_Y2069_' ], [ '_Z2069_' ]
which appears like:
My current workaround is to replace all occurrences of U+2069 with the original string, but that seems hacky, and I’m afraid that 2069 isn’t the only value that is automagically converted.
How can I prevent this erroneous conversion?