I’m working on a Prolog DCG parser to tokenize a string into specific patterns. My goal is to parse tokens like mul(Number,Number)
, do()
, and don't()
, while ignoring every other patterns.
Here’s my current implementation:
parser([Token|Rest]) --> not_token, token(Token), not_token, parser(Rest).
parser([]) --> [].
token(mul(N1, N2)) -->
"mul(", number(N1), ",", number(N2), ")".
token(do) --> "do()".
token(dont) --> "don't()".
not_token -->
string(Codes),
{ + phrase(token(_), Codes) }.
When I run the following query:
?- phrase(parser(Tokens), `mul(mul(123,456)dommmmulmul(1,2))`).
it does find a correct solution but also a wrong solutions because of how not_token
is defined.
Tokens = [mul(123, 456), mul(1, 2)] ;
Tokens = [mul(123, 456), mul(1, 2)] ;
Tokens = [mul(123, 456), mul(1, 2)] ;
Tokens = [mul(123, 456), mul(1, 2)] ;
Tokens = [mul(123, 456), mul(1, 2)] ;
Tokens = [mul(123, 456), mul(1, 2)] ;
Tokens = [mul(123, 456), mul(1, 2)] ;
Tokens = [mul(123, 456), mul(1, 2)] ;
Tokens = [mul(123, 456), mul(1, 2)] ;
Tokens = [mul(123, 456)] ; -- Incorrect
Tokens = [mul(1, 2)] ; -- Incorrect
false.
I can use the cut to stop at the first solution, but is there a way to express not_token
in DCG so that it only returns a correct result.
For more complicated test case, it’s actually from https://adventofcode.com/2024/day/3.
The current solution can solve the exact problem with a cut, but I want to be able to have my DCG written more strict so that it only finds a correct solution.
2