I created 2 empty files in VS Code, one with UTF-8 encoding(file.utf8.txt) and the other with GB 2312 encoding(file.gb2312.txt).
If we use the od
command to view the contents of these 2 empty files, we will find that the character encoding used by a file does not exist in the file itself (text files do not have so-called headers to store metadata). This looks like no problem.
$ touch file.utf8.txt
$ touch file.gb2312.txt
$ od -t x1 file.utf8.txt
$ od -t x1 file.gb2312.txt
Next, I start up the Chinese input method(IME) and use the same Latin letter sequence “zhongwen” to input 2 Chinese characters “中文” in these 2 files, separately. Regardless of which file, the Chinese characters can be displayed normally without glitches or mojibake.
However, when I use the od
command to view the contents of the 2 files again, I find that the IME seems to produce different outputs for the same input “zhongwen” as follows,
$ od -t x1 file.gb2312.txt
0000000 d6 d0 ce c4
0000004
$ od -t x1 file.utf8.txt
0000000 e4 b8 ad e6 96 87
0000006
I wonder if the IME can “telemeter” (with some Inter-Process Communication tricks) which encoding the editor is currently using and perform some kind of output conversion, or does the IME always produce the same output for the same input, such as always outputting the Unicode encoding of the 2 Chinese characters “中文” regardless of the current encoding of the editor, and then the operating system (or other magical components) converts this output into UTF-8 or GB 2312 correspondingly?