I am new to tree-sitter, and I am trying to generate Grammar that split the input into single words unless they surrounded with double quotes.
I create the following grammar:
<code>_statement: $ => seq(
repeat1($.any_token),
),
any_token: $ => choice(
$.string_token,
$.single_word,
),
string_token : $ => prec(2,seq(""", repeat1($._string_content), """)),
_string_content: $ => /[^"]+/,
single_word: $ => prec(1,/[^ trn]+/)
</code>
<code>_statement: $ => seq(
repeat1($.any_token),
),
any_token: $ => choice(
$.string_token,
$.single_word,
),
string_token : $ => prec(2,seq(""", repeat1($._string_content), """)),
_string_content: $ => /[^"]+/,
single_word: $ => prec(1,/[^ trn]+/)
</code>
_statement: $ => seq(
repeat1($.any_token),
),
any_token: $ => choice(
$.string_token,
$.single_word,
),
string_token : $ => prec(2,seq(""", repeat1($._string_content), """)),
_string_content: $ => /[^"]+/,
single_word: $ => prec(1,/[^ trn]+/)
I expected that strings like "aaa" "bbb ccc"
, will be parse into 2 token of string_token
as it’s have higher precedence, but it’s actually parsed into 3 tokens of single_word
.
(When removing the single_word
from the choice
it’s indeed parsed into 2 token from type string_token
)