Java, by default, uses UTF-16 to represent characters in the String
data type.
I inherited a JavaFX project which currently has some Strings
in UTF-8 and others in UTF-16. This is causing bugs (in pop-ups for example) and I’m at a stage where I must provide some uniformity and choose between one and the other. Do note that for the pop-ups, I must use UTF-8 because UTF-16 doesn’t show the characters correctly (I’m not sure why this happens, nor is that the focus of this question).
If Java used UTF-8 by default, I would absolutely use it as well because it is the de facto encoding for the foreseeable future. However, since Java uses UTF-16 by default, I was thinking of changing everything to UTF-16 to be consistent with the language, and then if need be when creating these pop-ups convert to UTF-8. Since there are many pitfalls associated with this encoding, of which a good summary is [1], I’m scared that I’m making the wrong decision.
So, which encoding should I use to store my String
variables?
A similar question [2] was asked but for PHP and not between UTF-16 and UTF-8. I believe this qualifies as a different question due to Java natively using UTF-16.
[1] – Should UTF-16 be considered harmful?
[2] – Should I convert the whole project to UTF-8?
10
Java uses UTF-16 internally. But nobody needs to care about that, except for a tiny bit of efficiency.
UTF-8 is much more what everyone uses as the standard for external representation.
0