First and foremost, lexer rules are always global in ANTLR. Every token in your input will be assigned one, and only one, token type. If you separate your lexer rules into multiple files, it becomes a maintenance nightmare to determine cases where tokens are ambiguous. The general rule is:
Avoid using
import
for lexer grammars which contain rules that are not marked with thefragment
modifier.The
ATTR
token will be assigned to inputs matching what looks like anATTR
, regardless of whether or not the predicate in theattr
rule succeeds. This will prevent inputs which match theATTR
rule from being considered as another token type. You should move the semantic predicate from theattr
rule to theATTR
rule to prevent the lexer for ever creatingATTR
tokens for inputs which are not in the set of predefined attributes.The
ParserRuleContext.exception
field is not guaranteed to be set in the event of a syntax error. The only way to determine that a syntax error did not occur is to callParser.getNumberOfSyntaxErrors()
after parsing, or add your ownANTLRErrorListener
.Your last lexer rule should resemble the following. Otherwise, input sequences which do not match a lexer rule will be silently dropped. This rule passes those inputs on to the parser for handling/reporting.
ErrorChar : . ;
For complicated grammars, avoid using combined grammars. Instead, create
lexer grammar
andparser grammar
grammars, where the parser grammars use thetokenVocab
option to import the tokens. Combined grammars allow you to implicitly declare lexer rules by writing string literals in parser rules, which reduces the maintainability of large grammars.ReplaceInWith.g4 contains many rules with embedded actions. These actions should be moved to a separate listener that you run after parsing is complete, and the
returns
clauses from these rules should be removed. This improves both the portability and reusability of your grammar. An example of how to do this can be seen in these commits which are part of a larger pull request showing conversion of an application using ANTLR 3 to ANTLR 4.
ANTLR4 Accepting additional tokens as valid?
Question
I'm building a small rule language to test and get used to ANTLR. I'm using ANTLR V4 and I have the following grammar split as follows:
Lexer.g4
lexer grammar Lexer;
/*------------------------------------------------------------------
* LEXER RULES - GENERIC KEYWORDS
*------------------------------------------------------------------*/
NOT
: 'not'
;
NULL
: 'null'
;
AND
: 'and'
| '&'
;
/*------------------------------------------------------------------
* LEXER RULES - PATTERN MATCHING
*------------------------------------------------------------------*/
DELIM
: [\|\\/:,&@+><^]
;
WS
: [ \t\r\n]+ -> skip
;
VALUE
: SQUOTE TEXT SQUOTE
;
fragment SQUOTE
: '\''
;
fragment TEXT
: ( 'a'..'z'
| 'A'..'Z'
| '0'..'9'
| '-'
)+ ;
Attribute.g4
grammar Attribute;
/*------------------------------------------------------------------
* Semantic Predicate
*
* Attributes are capitalised words that may have spaces. They're
* loaded from the database and and set in the glue code so that
* they can be cross checked here. If the grammar passed in sees
* an attribute it will pass so long as the attribute is in the
* database, otherwise the grammar will fail to parse.
*------------------------------------------------------------------*/
attr
: a=ATTR {attributes.contains($a.text)}?
;
ATTR
: ([A-Z][a-zA-Z0-9/]+([ ][A-Z][a-zA-Z0-9/]+)?)
;
ReplaceInWith.g4
grammar ReplaceInWith;
/*------------------------------------------------------------------
* REPLACE IN WITH PARSER RULES
*------------------------------------------------------------------*/
replace_in_with
: rep in with {row.put($in.value , $in.value.replace($rep.value, $with.value));}
| repAtt with {row.put($repAtt.value, $with.value);}
;
rep returns[String value]
: REPLACE v=VALUE {$value = trimQuotes($v.text);}
;
repAtt returns[String value]
: REPLACE a=attr {$value = $a.text;}
;
in returns[String value]
: IN a=attr {$value = $a.text;}
;
with returns[String value]
: WITH v=VALUE {$value = trimQuotes($v.text);}
;
/*------------------------------------------------------------------
* LEXER RULES - KEYWORDS
*------------------------------------------------------------------*/
REPLACE
: 'rep'
| 'replace'
;
IN
: 'in'
;
WITH
: 'with'
;
Parser.g4
grammar Parser;
/*------------------------------------------------------------------
* IMPORTED RULES
*------------------------------------------------------------------*/
import //Essential imports
Attribute,
GlueCode,
Lexer,
//Actual Rules
ReplaceInWith,
/*------------------------------------------------------------------
* PARSER RULES
* MUST ADD EACH TOP LEVEL RULE HERE FOR IT TO BE CALLABLE
*------------------------------------------------------------------*/
eval
: replace_in_with
;
GlueCode.g4
Java to supply static calling functionality to the grammar and to set the attributes up from the database.
ParserErrorListener.java
public class ParserErrorListener extends ParserBaseListener
{
/**
* After every rule check to see if an exception was thrown, if so exit with a runtime exception to indicate a
* parser problem.<p>
*/
@Override
public void exitEveryRule(@NotNull ParserRuleContext ctx)
{
super.exitEveryRule(ctx);
if (ctx.exception != null)
{
throw new ParserRuntimeException(String.format("Error evaluating expression(s) '%s'", ctx.exception));
} //if
} //exitEveryRule
} //class
When I supply the following to the grammar it passes as expected:
"replace 'Acme' in Name with 'acme'",
"rep 'Acme' in Name with 'acme'",
"replace 'Acme' in Name with 'ACME'",
"rep 'Acme' in Name with 'ACME'",
"replace 'e' in Name with 'i'",
"rep 'e' in Name with 'i'",
"replace '-' in Number with ' '",
"rep '-' in Number with ' '",
"replace '555' in Number with '00555'",
"rep '555' in Number with '00555'"
Where NAME and NUMBER are setup as attributes for the semantic predicate.
However when I pass in the following statement the grammar still passes but I'm not sure why it matches:
"replace any 'Acme' in Name with 'acme'",
"replaceany 'Acme' in Name with 'acme'",
Again NAME is passed in as an attribute to be matched by the semantic predicate, this part of the grammar works in my tests. The part that's failing is the 'any' part. The grammar matches to replace and then gets the next token which it thinks is 'Acme' ignoring the 'any' part in both examples above. What I was expecting here is the grammar to fail and in the Listener on the exit rule I have added a check which should throw a Runtime exception, which is caught by the GlueCode to indicate a failure.
Any ideas on how I can get my grammar to throw an error when this occurs?
Solution