I’m going to be writing an application that is pure HTML5 and JS and MVC.net back-end.
We have .resx files that are getting compiled to .js files for resources in the html5 application. The application has to work in English and in Chinese which I understand to mean that we need to use UTF-16 everywhere.
Does anyone have any experience using UTF-16 for such a task, or any best practices thereof?
4
Why do you have this understanding? Both encodings [UTF-8 and UTF-16] can encode all unicode characters by the definition of them being unicode encodings.
Anyway, UTF-8 is more optimal for storage and transmission than UTF-16 in your case. Majority of your characters in the files will not be in Chinese but in markup/js syntax. UTF-8 uses 1 byte for those whereas UTF-16 uses 2 bytes for those, hence UTF-8 wins.
For common Chinese characters UTF-8 needs 3 bytes and UTF-16 needs 2 bytes. Both need 4 bytes for the rarer
characters on the supplemental planes. This gives 33% savings for UTF-16 per Chinese character.
UTF-8 uses 1 byte for any “programming character”. <div>
is 5 bytes in UTF-8 and 10 bytes in UTF-16. 50% savings
for UTF-8 per “programming character”.
2