When I use MALLET with German texts, my own or the texts proposed by MALLET itself, I never get a correct import, always letters as äöü are missing respectivly written in a false way. Trying –encodig UTF-8 does not help.
I’m working with windows 11.
I would appreciate a hint how to get a correct representation of German texts.
Mallet assumes UTF-8 by default, and the provided examples in German are definitely UTF-8 encoded. It’s also rare these days to see any docs not in UTF-8. This makes me think the only remaining issue could be something with the operating system or the JVM. If you give --help
to the Mallet import command, it will tell you the default encoding under the --encoding
option, which could help debug. But I’m not sure why setting that variable wouldn’t override that.
1