I’m looking to build a Virtual Machine as a platform independent way to run some game code (essentially scripting).
The Virtual Machines that I’m aware of in games are rather old: Infocom’s Z-Machine, LucasArts’ SCUMM, id Software’s Quake 3. As a .net Developer, I’m familiar with the CLR and looked into the CIL Instructions to get an overview of what you actually implement on a VM Level (vs. the language level). I’ve also dabbled a bit in 6502 Assembler during the last year.
The thing is, now that I want¹ to implement one, I need to dig a bit deeper. I know that there are stack based and register based VMs, but I don’t really know which one is better at what and if there are more or hybrid approaches. I need to deal with memory management, decide which low level types are part of the VM and need to understand why stuff like ldstr works the way it does.
My only reference book (apart from the Z-Machine stuff) is the CLI Annotated Standard, but I wonder if there is a better, more general/fundamental lecture for VMs? Basically something like the Dragon Book, but for VMs? I’m aware of Donald Knuth’s Art of Computer Programming which uses a register-based VM, but I’m not sure how applicable that series still is, especially since it’s still unfinished?
Clarification: The goal is to build a specialized VM. For example, Infocom’s Z-Machine contains OpCodes for setting the Background Color or playing a sound. So I need to figure out how much goes into the VM as OpCodes vs. the compiler that takes a script (language TBD) and generates the bytecode from it, but for that I need to understand what I’m really doing.
¹ I know, modern technology would allow me to just interpret a high level scripting language on the fly. But where is the fun in that? 🙂 It’s also a bit hard to google because Virtual Machines is nowadays often associated with VMWare-type OS Virtualization…
11
I’d start by checking Lua. Both as a sample implementation, and as a very usable VM/language out of the box if you finally decide not to roll your own.
The source code is very readble, and there’s also the Annotated source code. And some Design documents written by the main author, Roberto Ierusalimschy.
Finally, if you choose to use it instead of your own, you’ll find that it’s been long a favorite among game developers, and there’s a very high performance JIT implementation.
About stack- vs register-based, I think stack-based VMs are easier to design, but the compiler can be more complex. As the Iesualimschy paper notes, Lua was one of the first register-based language VMs, but afterwards several others have sprouted, Most notably, LLVM, Dalvik, and some modern JavaScript VMs.
3
I don’t have any specific resources to link you to at the moment, but I’ve researched a similar topic in the past and found the Smalltalk VM to be a good learning aid as well. There are many academic papers and articles written about the byte codes used by Smalltalk, as well as writing interpreters and VMs to use that bytecode. A Google search for smalltalk vm implementation
or smalltalk bytecode interpreter
should yield lots of reading material.
If you’d like to see some source code or try out an implementation I recommend either the Squeak or Pharo versions.
The related language/VM Self might also interest you, as Self is basically Smalltalk with prototype-based objects (similar to JavaScript).
I would start from analyzing of how [script] source code gets into your machine or runtime environment.
If you have something like in HTML documents <a onclick="dosomething();">
then you will need very fast compiler, bytecode execution speed does not really matter that much in this case.
If your use cases are closer to Java/.NET where you can afford full blown compilation then VM architecture and bytecode structure will be closer Java bytecodes or IL.
Another criteria is what I name as “glueness”. Originally scripts were developed as glue languages – scripts just define the way of how to connect various native functions together (Perl, Python, Ruby, JS). In that case effectiveness of VM and bytecode is far less critical than in case of Java/.NET when most of your code are functions written in the language itself.
And the last major criteria I would use is extensibility of your language.
If you have plans to add to you language runtime many native objects/functions implemented in, say, C++ then your VM architecture should be “convenient” for integration with C++.
For example: if you plan to expose to script C++ objects as they are then the only option for you will be reference counting as a heap management (like Python, see boost::python as an example of integration). If you plan to use moving/compacting heap/GC then it will be different story. Lua’s way of adding native stuff into runtime is a bit tricky [for C++ developers].
In other words, try to define first your typical use case and it will be easier to suggest what to read for you.
5