I’ve made a program for school; that goes through a plain text file and makes a concordance for each word. It will take each word, remove non-alphabetical characters from the front and back, and put it into a Binary Search Tree. When encountering Unicode characters in a text, you get random ascii characters that make up the multibyte character instead of what it is: for example, yarns—and is outputted as yarnsùand. I spent hours months ago and this week trying to solve this problem, so what do I do?
Here is a MRE of the bug.
#include <string>
#include <iostream>
#include <fstream>
#include <windows.h>
#include <consoleapi2.h>
using namespace std;
int main()
{
wfstream file;
file.open("Example.txt", ios::in);
// Changes buffer from char to wchar_t
wchar_t* buffer = new wchar_t[100];
file.rdbuf()->pubsetbuf(buffer, 100);
wchar_t CurrentStreamCharacter = file.get();
wstring NewWord = L"";
while (file)
{
NewWord.push_back(CurrentStreamCharacter);
CurrentStreamCharacter = file.get();
}
//SetConsoleOutputCP(65001);
wcout << NewWord << endl;
wcout << "yarns—and even convictions. The Lawyer—the best of old fellows—had,";
return 0;
}