Problems identifying an invisible character
I’m analyzing a corpus of documents and I noticed that there are 4 instances of tokens that look identical but are recognized as different. Today I imported the dataset to another software I it highlighted what looked like an empty space before the word: