In ANTLR3, I need to discriminate between a 'comment' and a 'directive' (which looks like a comment)

StackOverflow https://stackoverflow.com/questions/21688451

  •  09-10-2022
  •  | 
  •  

Question

I'm fairly new to ANTLR, and I've run into a problem.

I have a grammar I'm trying to write for a language that includes single-line comments and language directives that begin with the same comment identifier. For example:

--This is a comment.  What follows is a directive with a parameter
--directive:param

A directive will always be in that format - two dash characters followed by a command, a colon, and a single parameter.

I would like to have the lexer ignore an actual comment (send it to the hidden channel), but tokenize the directives. I have the following lexer rules:

DCOMMAND    : DATABASE;
fragment DATABASE   : D A T A B A S E;
fragment COMMENTSTART   : '--';
LINE_COMMENT    : COMMENTSTART ~(DCOMMAND|('\n'|'\r')*) {$channel=HIDDEN;};
fragment A  : ('a'|'A');
fragment B  : ('b'|'B');
fragment C  : ('c'|'C');
fragment D  : ('d'|'D');
....

There's only one directive for now: 'database'. The DCOMMAND token will eventually represent several keywords potentially. The problem is that my lexer is always shoving anything that starts with '--' into the hidden channel. How do I make the LINE_COMMENT token not match directives? Or will I have to move comment handling into the parser?

Était-ce utile?

La solution

AFAIK, there's no way to handle this in your lexer grammar without some manual code (which is IMHO better than promoting comments to the parser!).

What you could do is this:

  • match '--'
  • in a custom method, manually look ahead until the end of the line (EOL). Let this method return true when the '--' is part of a directive
    • if what you matched until the EOL looks to be a directive, do NOT match the characters and return true
    • if what you matched until the EOL isn't a directive, match the characters and return false
  • if your custom method returned false, it must be a comment and you can skip() it

A quick demo:

grammar T;

@lexer::members {

  private boolean directiveAhead() throws MismatchedTokenException {

    StringBuilder b = new StringBuilder();

    for(int ahead = 1; ; ahead++) {

      // Grab the next character from the input.
      int next = input.LA(ahead);

      // Check if we're at the EOL.
      if(next == -1 || next == '\r' || next == '\n') {
        break;
      }

      b.append((char)next);
    }

    if(b.toString().trim().matches("\\w+:\\w+")) {
      // Do NOT let the lexer consume all the characters, just return true!
      return true;
    }
    else {
      // Let the lexer consume all the characters!
      this.match(b.toString());
      return false;
    }
  }
}

parse
 : directive EOF
 ;

directive
 : DIRECTIVE_START IDENTIFIER COL IDENTIFIER 
 ;

IDENTIFIER
 : ('a'..'z' | 'A'..'Z')+
 ;

DIRECTIVE_START
 : '--' { if(!directiveAhead()) skip(); }
 ;

COL
 : ':'
 ;

SPACES
 : (' ' | '\t' | '\r' | '\n')+ {skip();}
 ;
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top