I’m having problems trying to manually generate a utf-16 encoding for a unicode codepoint following the instructions here:
Manually converting unicode codepoints into UTF-8 and UTF-16
The example in question is the unicode character “FACE WITH OPEN MOUTH”, Decimal code point: 128558, Hex code point 1F62E, according to:
https://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=%F0%9F%98%AE&mode=char
1st step (Subtract 10000hex from the codepoint):
1F62E -1000 (hex) = 1E62E (hex)= 1 1110 0110 0010 1110 (binary)
2nd step (Express result as 20-bit binary):
0001 1110 0110 0010 1110
3rd step (Use the pattern 110110xxxxxxxxxx 110111xxxxxxxxxxbin to encode the upper- and lower- 10 bits into two 16-bit words.):
1st word (with upper 10 bit):
110110 0001111001 = D879 (hex)
2nd word (with lower 10 bit):
110111 1000101110 = DE2E (hex)
So the first word is not correct according to https://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=%F0%9F%98%AE&mode=char
It should be D83D but I get D879. What have I done wrong?
Many thanks.
1