As the title says, can all strings in JavaScript be converted into an array of 8-bit unsigned integers?
The answer will help me understand if new TextEncoder().encode(myString)
will be able to convert all legal JS strings into Uint8Array
, and if so why.
My current understanding:
- Unicode is a character set, where each character maps to a decimal number: H=104, E=101, L=108, O=111, etc.
- UTF-8 is a way to encode a Unicode character into binary format, so it can be stored on a computer. 104=01101000, etc.
- 8-bit unsigned integers range from 0 to 255
- Unicode contains many characters represented by a decimal above 255. For instance, the cyrillic character Ѐ is Unicode 1024, which in turn can be encoded binary with UTF-8.
The part I am struggling with is this: There are many characters above 255 in Unicode, which are perfectly valid as JavaScript strings. How come they can be converted to 8-bit unsigned integers? I tested a Cyrillic character with new TextEncoder().encode(Ѐ)
and it returned an Uint8Array with two entries: {0: 208, 1: 128}
.
Does that mean that all characters in Unicode can be represented by one or more 8-bit unsigned integers?
18