I’m aware that most modern languages use reserved words to prevent things like keywords from being used as identifiers.
Reserved words aside, let’s assume a language that allows keywords to be used as identifiers. (For example, in Ruby a keyword can be used as a method name). During compilation, how would you deal with this ambiguity?
The lexing phase doesn’t seem like a good fit because it would have to consider the tokens around it. The parsing phase also doesn’t seem like a good fit since ideally the parser would work with tokens that are unambiguous.
If I had to design it myself, I suppose I would have the lexer yield an ambiguous token, then have another level that considers the ambiguous token in the context of the tokens around it, (e.g. does the ambiguous token follow a def
keyword? Then it must be an identifier.) Then, I would hand the unambiguous tokens to the parser.
In languages that allow keywords to be used as identifiers, what is the standard way for the compiler to tell them apart?
1
If you notice in Ruby, you cannot call the method named like that directly, e.g. you cannot
do
begin()
You can do
obj.begin()
Because there you can have grammar like:
*Arguments* :
"(" ")"
*MemberExpression* :
*MemberExpression* "." *IdentifierName*
*CallExpression* :
*MemberExpression* *Arguments*
(Unrelated rules to the example left out for brevity)
to recognize it. It only requires separating the rule Identifier from IdentifierName:
*Identifier*:
*IdentifierName* **but not reserved word**
*IdentifierName*:
//Rules for identifier names here
If you have a starter begin
like in
begin()
Then you already activated a rule like
*Block*:
"begin" *indent* *statement* *outdent* "end"
And Ruby doesn’t try to figure out what you mean and it will just be a block.
But for method names where a receiver appears or some other prefix it is easy to allow keywords in the grammar and e.g. Javascript does it doo.
Grammar examples taken from ecma-262
3
In .Net, each language has a different set of keywords. For example, this means that a library written in C# can use identifiers that are reserved in VB.NET. So, to use such library from VB.NET, you need some way to use keywords as identifiers.
Each language uses a different syntax to do that:
-
in C#, you prepend a
@
:@keyword
-
in VB.NET, you enclose it in brackets:
[keyword]
-
in F#, you enclose it in double backticks:
``keyword``
1
I don’t think there is a standard way.
Sometimes you will see some lexer tricks that implements rules like “pure is a keyword only if the next token is the keyword native”.
At other times the grammer may employ the fact that in some circumstances all or some keywords can be interpreted as identifiers without introducing ambuguity.