I’m trying to get antlr4 to parse a custom string format, but having problems. Can anyone point out where I’m going wrong with this?
I want to parse this string:
#AAABD11111111[key_1-'VALUE_1']
I’d like to get this output
HEAD = #AAABD11111111
PAYLOAD = [key_1-'VALUE_1']
While HEAD is apparently tokenised and parsed successfully, PAYLOAD is generating errors.
These are my lexer rules.
lexer grammar stringLexer;
HYPHEN: '-';
// Lexer rules
HASH : '#' -> pushMode(HEAD);
mode HEAD;
TYPE: [A-Z] [A-Z] [A-Z];
SUBTYPE : [ABC];
COMPR : [DE];
LENGTH: DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT;
OSQBR: '[' -> pushMode(PAYLOAD);
CSQBR: ']' -> popMode;
mode PAYLOAD;
STRING : QUOTE ([^\'])* QUOTE;
KEY : [0-9a-zA-Z_]+;
ENTRY : KEY HYPHEN STRING;
fragment DIGIT: [0-9];
fragment QUOTE : ''';
fragment COMMA: ',';
These are my parser rules
parser grammar stringParser;
options { tokenVocab=stringLexer; }
// Parser rules
message : head payload;
head : HASH TYPE SUBTYPE COMPR LENGTH;
payload: OSQBR ENTRY CSQBR;
I built it with:
java org.antlr.v4.Tool stringLexer.g4
java org.antlr.v4.Tool stringParser.g4
javac string*.java
java org.antlr.v4.gui.TestRig string tokens -tokens`
And I receive these errors:
line 1:20 token recognition error at: '-'
line 1:21 token recognition error at: ''V'
line 1:29 token recognition error at: '']'
line 1:31 token recognition error at: 'r'
line 1:32 token recognition error at: 'n'
[@0,0:0='#',<'#'>,1:0]
[@1,1:3='AAA',<TYPE>,1:1]
[@2,4:4='B',<SUBTYPE>,1:4]
[@3,5:5='D',<COMPR>,1:5]
[@4,6:13='11111111',<LENGTH>,1:6]
[@5,14:14='[',<'['>,1:14]
[@6,15:19='key_1',<KEY>,1:15]
[@7,23:28='ALUE_1',<KEY>,1:23]
[@8,33:32='<EOF>',<EOF>,2:0]
Any help would be much appreciated (even just point me at a good resource to learn the necessary parts).