My code receives 4 bytes of data described as wide char that should be converted to UTF8. I have the possibility to put a string in the GUI on a Windows-PC and look at the received bytes on my end on Linux, but have no access to the encoding logic.
When the input consists of purely ASCII values, decoding seems straight forward:
Input: Xaver
Received data: 58 00 00 00 61 00 00 00 76 00 00 00 65 00 00 00 72 00 00 00
Decoding:
0x00 00 00 58: X
0x00 00 00 61: a
0x00 00 00 76: v
0x00 00 00 65: e
0x00 00 00 72: r
As soon as I add a non ASCII-character to the input it gets strange:
Input: Xäver
Received data: 58 00 00 00 05 00 00 00 30 9C 7A 00 9C 5B E0 FF 24 56 E0 FF 18 56 E0 FF 43 B9 73 00 62 00 00 00 70 5A E0 FF
Decoding:
0x00 00 00 58 X
0x00 00 00 05 ASCII ENQ?
0x00 7A 9C 30 ?
0xFF E0 5B 9C ?
0xFF E0 56 24 ?
0xFF E0 56 18 ?
0x00 73 B9 43 ?
0x00 00 00 62 b
0xFF E0 5A 70 ?
Out of curiosity, I put in another string containing an umlaut:
Input: abcä
Received data: 61 00 00 00 62 00 00 00 63 00 00 00 04 3B 21 F7 8A 9C 15 F7 38 BF D3 FF
Decoding:
0x00 00 00 61: a
0x00 00 00 62: b
0x00 00 00 63: c
0xF7 21 3B 04: ?
0xF7 15 9C 8A: ?
0xFF D3 BF 38: ?
I’m a little confused about the different values received for the umlaut ä
in both cases and can’t wrap my head around the reason for the strange encoding.
What is happening here? Does anybody have an explanation for this or an idea on how to proceed?