I’m about to implement my own very simple programming language, and an interpreter to execute code in that language.
The language will be very basic. Example code:
var x = 3
if x > 2 print x
if x < 2 print "hello"
The language wouldn’t feature anything more complex than single-line if
statements. This is because it’s the first time I’m attempting something like this, so I’m starting small.
The interpreter will be written in Java, and thus execute the code with Java operations.
My question is this:
I know that the topic of creating compilers and interpreters is very complex. But since I’m new to this, I believe I should start with basic techniques and approaches.
What should I learn before starting to work on this project? Especially what kind of knowledge regarding parsing and interpreting should I learn before starting?
Is it enough to just ‘break down the text into substrings, and then more substrings’, or should I learn more advanced techniques and apply them?
The knowledge and experience I acquire are meant to allow me to later build more knowledge on top of it when I continue learning and implementing interpreters. But shouldn’t be ‘too much’ for a first attempt.
2
Its going to depend a bit on the path you take for your interpreter. But the first thing you might want to try creating is a basic mathematical expression evaluator, because ultimately that can form the core of an interpreter. Calculating statements like (8 * 4) + 10
and ((10 + 2) * (11 / 5))
. This will get you exposed to parsing code, breaking everything into tokens, stacks, and other concepts without completely overwhelming you.
Once you have it working with math expressions, extend it with functions.
(8 * 4) + max( 5, 10)
Then data types
"this" + "that"
That’s a fairly large project, and should get you a good feeling for all that’s involved.
For starters, look at the Shunting Yard Algorithm
4
I think you have to learn at least two main concepts in order to successfully implement your basic interpreter : language processing and expression evaluation.
You might be able to get somewhere with manual string operations, but it will be tedious and probably have serious limitations or bugs. To right way to do it is to use a language processing library. The purpose of this kind of software is to transform the source code into an easy to use memory representation called Abstract Syntax Tree. For example 3 * (5 + 1) will be transformed into a tree like this :
*
/
3 +
/
5 1
There is a little bit of theory to understand in order to successfully implement this transformation. But the skill will be reusable for full-featured complex languages or domain specific languages. For Java, ANTLR is a pretty common library for language recognition, you should check it out : http://www.antlr.org/
The second skill will be about evaluating the AST. You need to learn about tree-like data structures and how to work with them. This will enable you to evaluate the AST and obtain the result and perform the actions requested by the program. Here is a very simple example of kind of code that you will need to write in order to evaluate the AST above :
def evaluate(tree):
if tree.label == "*":
return evaluate(tree.left) * evaluate(tree.right)
elif tree.label == "+":
return evaluate(tree.left) + evaluate(tree.right)
else:
return int(tree.label)
To go further you may also try to write a compiler instead of an interpreter. The first step is the same, but instead of evaluating the nodes of the tree, you’ll have to output some kind of machine code that will do the job afterwards.
Real hardware assembly can be difficult, but JVM bytecode is pretty simple and has a decent performance. There are libraries that can handle the binary aspect of it for you. For example ASM : http://asm.ow2.org/ This is how JVM languages work in general.
Good luck !