I’m reading a compiler textbook that compiles to some form of assembly. Since I don’t know this assembly language I decided to invent my own simple “assembly language” and implement a basic “virtual machine” which will execute these instructions.
Currently I’m thinking how I can implement function decleration and function calling in the language and VM.
I had the following idea, please tell me what you think:
Function declerations would look like simple labels. At the end of a function there’s an end
statement. The ‘main program’ is a function by itself. For examle:
main:
// some logic
CALL funcA
// more logic
END
funcA:
// .. some logic
END
However the difference between call <function>
and goto <label>
is this:
goto
simply sets the ‘instruction pointer’ (the register that holds the number of the next line to be executed) to the line number of the label.
call
does what goto
does, but before jumping it pushes the current line number (plus 1) onto a stack.
When end
is reached, the VM pops the top of the stack and jumps to this line number. If there is nothing in the stack, the program terminates.
So for example, for the code above this is what happens:
main: // this is where the VM starts
// some logic
CALL funcA // push onto the stack the number 4 (3+1), and jump to the label funcA
// more logic .. this is where we return to from funcA
END // pop the top of the stack and jump to this line number. nothing -> terminate
funcA:
// .. some logic
END // pop the top of the stack (the number 4), and jump to this line number
What do you think of this approach?
9
It depends on the VM, to be honest.
It is a bit easier to write a VM in an OO language that operates on data structures kept in memory rather than on some form of linearized bytecode. If you ever implement a toy programming language you might decide to do it this way, at least initially (I have done this before), in which case the VM is probably just a polymorphic “run()” method. A lot of scripting languages work this way, because the code will be generated at the same time in which it is run, so there is no need to maintain the bytecode for later.
If you compile all the way to a linearized bytecode (like how Java compiles), the way you handle operation calls will depend on the language. If you do not allow recursion (most old programming languages originally did not), you can simply inline the operation calls directly into the bytecode. This causes code bloat but is “safe” otherwise.
If you compile all the way to a linearized bytecode, but you want to allow recursion, then you’ll need to emulate a call stack. Each frame of the stack will hold a reference to the location in the code segment where that operation was invoked (for operation-based frames). It will also hold copies of all variables local to that frame. Objects to which those variables refer may be held on the stack or in a separate heap segment, depending on the design of the programming language and (more likely) the code optimizer being used by the compiler.