How do programming languages define and save functions/methods? I am creating an interpreted programming language in Ruby, and I am trying to figure out how to implement function declaration.
My first idea is to save the content of the declaration in a map. For example, if I did something like
def a() {
callSomething();
x += 5;
}
Then I would add an entry into my map:
{
'a' => 'callSomething(); x += 5;'
}
The problem with this is that it would become recursive, because I would have to call my parse
method on the string, which would then call parse
again when it encountered doSomething
, and then I would run out of stack space eventually.
So, how do interpreted languages handle this?
8
Would I be correct in assuming that your “parse” function not only parses the code but also executes it at the same time? If you wanted to do it that way, instead of storing the contents of a function in your map, store the location of the function.
But there’s a better way. It takes a bit more effort up-front, but it yields much better results as complexity increases: use an Abstract Syntax Tree.
The basic idea is that you only parse the code once, ever. Then you have a set of data types representing operations and values, and you make a tree of them, like so:
def a() {
callSomething();
x += 5;
}
becomes:
Function Definition: [
Name: a
ParamList: []
Code:[
Call Operation: [
Routine: callSomething
ParamList: []
]
Increment Operation: [
Operand: x
Value: 5
]
]
]
(This is just a text representation of the structure of a hypothetical AST. The actual tree would probably not be in text form.) Anyway, you parse your code out into an AST, and then you either run your interpreter over the AST directly, or use a second (“code generation”) pass to turn the AST into some output form.
In the case of your language, what you would probably do is have a map that maps function names to function ASTs, instead of function names to function strings.
6
You shouldn’t be calling parse upon seeing callSomething()
(I presume you meant callSomething
rather than doSomething
). The difference between a
and callSomething
is that one is a method definition while the other is a method call.
When you see a new definition, you’ll want to do checks related to ensuring you can add that definition, so:
- Check if function doesn’t already exist with same signature
- Ensure that method declaration is being performed in the proper scope (i.e. can methods be declared inside other method declarations?)
Assuming these checks pass, you can add it to your map and begin checking the contents of that method.
When you find a method call like callSomething()
, you should perform the following checks:
- Does
callSomething
exist in your map? - Is it being called properly (number of arguments matches signature that you have found)?
- Are arguments valid (if variable names are used, are they declared? can they be accessed at this scope?)?
- Can callSomething be called from where you are (is it private, public, protected?)?
If you find that callSomething()
is okay, then at this point what you would want to do really depends on how you wish to approach it. Strictly speaking, once you know that such a call is okay at this point, you could only save the name of the method and the arguments without going into further details. When you run your program, you will invoke the method with the arguments you should have at runtime.
If you want to go further, you could save not just the string but a link to the actual method. This would be more efficient, but if you have to manage memory, it can get confusing. I would recommend you simply hold onto the string at first. Later you can try to optimize.
Note that this is all assuming that you’ve lexxed your program, which means you have recognized all tokens in your program and know what they are. That isn’t to say you know if they make sense together yet, which is the parsing phase. If you don’t yet know what the tokens are, I suggest you first focus on getting that information first.
I hope that helps! Welcome to Programmers SE!
Reading your post, I noticed two questions in your question. The most important one is how to parse. There are many kinds of parsers (e.g. Recursive descent parser, LR Parsers, Packrat Parsers) and parser generators (e.g. GNU bison, ANTLR) you can use to traverse a textual program “recursively” given a (explicit or implicit) grammar.
The second question is about storage format for functions. When you are not doing syntax-directed translation, you create an intermediate representation of your program, which can be an abstract syntax tree, or some custom intermediate language, in order to do further processing with it (compile, transform, execute, write on a file, etc).
From a generic point of view, the definition of a function is little more than a label, or bookmark, in the code. Most other loop, scope and conditional operators are similar; they’re stand-ins for a basic “jump” or “goto” command in the lower levels of abstraction. A function call basically boils down to the following low-level computer commands:
- Concatenate the data of all parameters, plus a pointer to the next instruction of the current function, into a structure known as a “call stack frame”.
- Push this frame onto the call stack.
- Jump to the memory offset of the first line of the function’s code.
A “return” statement or similar will then do the following:
- Load the value to be returned into a register.
- Load the pointer to the caller into a register.
- Pop the current stack frame.
- Jump to the caller’s pointer.
Functions, therefore, are simply abstractions in a higher-level language specification, which allow humans to organize code in a more maintainable and intuitive way. When compiled into an assembly or intermediate language (JIL, MSIL, ILX), and definitely when rendered as machine code, almost all such abstractions go away.
actually function is a group of code it only contain a common identity (ie it is detected
by its name)
use an abstract syntax tree
just like statement and expression a function is also a statement list having a name
suppose your function name is add then(add()) some node of ast will look like this
AST_node
/
/
FUNC_TYPE
/
/ EXPR_TYPE
add //node
/
c return ans
/
= c =
/
+ END add
/
a b +
1
END:
in syntax tree we traverse until getting an END(end signal).
on EXPR_TYPE means expression
ans =add()+ 1
we search tree for FUNC_TYPE(function type) add then execute(or generate code) until
an End signal
iam implemented like this it worked for me,thing it will work for you