ANTLR4 book, calculator exercise

https://stackoverflow.com/questions/22386760

antlr
antlr4

14-06-2023
|

Question

I am stuck at a very fundamental level with antlr. Going through 'The Definitive ANTLR 4 Reference' by Dr. Parr. In section 4.2, 'Building a Calculator Using a Visitor' the following grammar is listed:

grammar LabeledExpr; // rename to distinguish from Expr.g4

prog:   stat+ ;

stat:   expr NEWLINE                # printExpr
    |   ID '=' expr NEWLINE         # assign
    |   NEWLINE                     # blank
    ;

expr:   expr op=('*'|'/') expr      # MulDiv
    |   expr op=('+'|'-') expr      # AddSub
    |   INT                         # int
    |   ID                          # id
    |   '(' expr ')'                # parens
    ;

MUL :   '*' ; // assigns token name to '*' used above in grammar
DIV :   '/' ;
ADD :   '+' ;
SUB :   '-' ;
ID  :   [a-zA-Z]+ ;      // match identifiers
INT :   [0-9]+ ;         // match integers
NEWLINE:'\r'? '\n' ;     // return newlines to parser (is end-statement signal)
WS  :   [ \t]+ -> skip ; // toss out whitespace

I'm trying to add clear statement to the above, from the book:

Before moving on, you might take a moment to try to extend this expression language by adding a clear statement. The clear command should clear out the memory map, and you’ll need a new alternative in rule stat to recognize it. Label the alternative with # clear and then run ANTLR on the grammar to get the augmented visitor interface.

This is my attempt:

grammar LabeledExpr;

prog:   stat+ ;

stat:   expr NEWLINE          # printExpr
    |   ID '=' expr NEWLINE   # assign
    |   clear NEWLINE         # clearMem
    |   NEWLINE               # blank
    ;

expr:   expr op=('*'|'/') expr   # MulDiv
    |   expr op=('+'|'-') expr   # AddSub
    |   INT                   # int
    |   ID                    # id
    |   '(' expr ')'          # parens
    ;

clear:  CLEAR               
    ;

MUL:    '*' ;   // assigns token name to '*' used above in grammar
DIV:    '/' ;
ADD:    '+' ;
SUB:    '-' ;

ID  :   [a-zA-Z]+ ;
INT :   [0-9]+ ;
NEWLINE: '\r'?'\n' ;
WS  :   [ \t]+ -> skip ;
CLEAR:  'clear';

However, visitClearMem never gets called:

@Override
public Integer visitClearMem(LabeledExprParser.ClearMemContext ctx) {
    String text = ctx.getText();
    if (text.equalsIgnoreCase("clear")) {
        memory.clear();
    }
    return 0;
}

Solution

The problem is the location of the CLEAR rule in your lexer. Since the input clear matches both the rules ID and CLEAR, ANTLR chooses the one that appears first in the grammar. In this case, the input clear becomes an ID.

In general, you want to place all keywords for your language before the rule for other identifiers, to ensure that they are properly matched.

OTHER TIPS

As explained by Sam Harwell, just move up the rule CLEAR: 'clear'; before the ID rule and it works and a minor modification to the code

@Override
public Integer visitClearMem(LabeledExprParser.ClearMemContext ctx) {
    String text = ctx.getText();
    if (text.contains("clear")) {
        memory.clear();
    }
    return 0;
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow