Many VMs execute a language of binary form, knows as ‘bytecode’, which is assembled down from a human readable ‘assembly’ language.
For example the assembly instructions push 1 push 2 add
are translated (I think) to a series of ones and zeroes, which is then executed by the VM.
Why? Why don’t VMs, and the JVM as an example, execute the assembly instructions directly?
They don’t have the limitation of physical computers that can only handle ones and zeroes. The JVM can very well take textual instructions such as push 1 push 2
and execute them as they are. Why the additional step of compilation?
4
Here are a couple of reasons to think about:
Using human readable assembly language would waste space on disk and in memory. That has an impact on caching, and therefore on performance. In your example the instruction ‘push’ takes up four bytes. Why not compress the program by using one byte tokens for all instructions instead of the human readable strings?
It wastes cycles on the processor. Your VM probably has at least two instruction mnemonics that start with ‘p’. In order for your VM to figure out whether an instruction is ‘push’ or ‘pop’ it has to compare at least two bytes. It’s much more efficient if each instruction can be uniquely identified by looking at single byte. The argument to your instructions is a string representing a number. The string has to be converted to a binary format appropriate for they underlying CPU before it can be used in arithmetic. That conversion will take dozens of instructions all by itself. Why do that every time the program is run? It’s much more efficient to do it in a one-time pass when the byte code is created.
8
Assembly is not binary code. Assembly is a textual representation of binary code. In order to execute it, you have to first run it through an assembler that generates byte code or machine code, and that takes time.
Although it is possible to assemble a byte-code program from a byte-code assembler, most programmers don’t do it. Instead, they use a high-level language which compiles to byte-code, directly.
Consequently, the assembly step is not needed.
Byte code is not the same as the processor instructions that actually execute your program. The VM must translate the byte code into processor instructions first, and then those processor instructions are executed. It is done this way because byte code is machine-independent; you can use the same byte code on processors having a completely different instruction set, so long as you have a VM for that processor.
5
The bytecode is more efficient to store and easier (hence faster) to parse for the VM.
The textual representation of the bytecode (the “assembly” language) is mainly used for understanding the internals of the bytecode compiler and/or for analyzing the optimization passes. As Robert Harvey said, the majority of programmers will never deal with the bytecode directly so there is little reason to do this.
Because of different architectures and a common format that can be adapted to the platform at hand.
And possibly because of the vm code being less than comparable platform machine code if scripting language engine codes for garbage collection, hashes and stacks are also included that are referenced in.
Oh and as mentioned: single byte bytecodes instead of more complicated byte or word codes, easier to design.