Using ANTLR to parse JavaDoc comments

https://stackoverflow.com/questions/3836096

26-09-2019
|

Question

I'm attempting to parse one particular (home grown) JavaDoc tag in my JavaScript file and I'm struggling to understand how I can achieve this. Antlr is complaining as documented below:

jsDocComment 
    : '/**' (importJsDocCommand | ~('*/'))* '*/' <== See note 1
    ;

importJsDocCommand
    : '@import' gav
    ;

gav
    :  gavGroup ':' gavArtifact
    -> ^(IMPORT gavGroup gavArtifact)
    ;

gavGroup 
    : gavIdentifier
    ;

gavArtifact
    : gavIdentifier
    ;

gavIdentifier 
    : ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'-'|'.')* <== See note 2
    ;

Note 1: The following alternatives can never be matched: 1
Note 2: Decision can match input such as "'_'..'.'" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input

Here's what I'm trying to parse:

/** a */
/** @something */
/** @import com.jquery:jquery */

All lines should parse ok, with just the @import statement (along with its Maven group:artifact value) created under an AST tree element named "IMPORT".

Thanks for your assistance.

Solution 2

My solution to this problem was to use ANTLR's Lexer without the parser and filter out stuff that I'm not interested in. Here's what I came up with (it also looks for globally defined variables as well as imports):

lexer grammar ECMAScriptLexer;

options {filter=true;}

@lexer::header {
    package com.classactionpl.mojo.javascript;
}

@members {
    int scopeLevel = 0;
}

IMPORTDOC
    :   '/**' .* IMPORT .* (IMPORT)* '*/'
    ;

fragment 
IMPORT
    :   '@import' WS groupId=GAVID ':' artifactId=GAVID
        {System.out.println("found import: " + $groupId.text + ":" + $artifactId.text);}
    ;

fragment
GAVID  
    :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'-'|'0'..'9'|'.')*
    ;

COMMENT
    :   '/*' .* '*/'
    ;

SL_COMMENT
    :   '//' .* '\n' 
    ;

ENTER_SCOPE
    :   '{' {++scopeLevel;}
    ;

EXIT_SCOPE
    :   '}' {--scopeLevel;}
    ;

WINDOW_VAR
    :   'window.' name=ID WS? value=(';' | '=') ~('=')
        {
            System.out.println("found window var " + $name.text + " = " + ($value == ';'));
        }
    ;

GLOBAL_VAR
    :   'var' WS name=ID WS? value=(';' | '=') ~('=')
        {
            if (scopeLevel == 0) {
                System.out.println("found global var " + $name.text + " = " + ($value == ';'));
            }
        }
    ;

fragment
ID  :   ('a'..'z'|'A'..'Z'|'$'|'_') ('a'..'z'|'A'..'Z'|'$'|'_'|'0'..'9')*
    ;

fragment
WS  :   (' '|'\t'|'\n')+
    ;

OTHER TIPS

Christopher Hunt wrote:

Note 1: The following alternatives can never be matched: 1

~('*/') is incorrect: you can only negate single characters in lexer rules (!). In your snippet, you're trying to negate something in a parser rule. In parser rules, you're not negating character(s), but tokens. For example:

parse : ~A;
foo   : .;
A     : 'A';
B     : 'B';
C     : 'C';

the parse rule will not match any character except 'A', but matches either 'B' or 'C'. And foo does not match any character, but matches any token (or lexer rule).

Christopher Hunt wrote:

Note 2: Decision can match input such as "'_'..'.'" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input

Two questions:

did you post the entire grammar?
are you trying to parse the entire JS file or are you just "filtering" JS files and pulling out the JavaDoc comments?

If it's the latter, there is a much easier way to do this using ANTLR (and can give an explanation if this is the case).

EDIT

It's easiest to just add a new DocComment rule the lexer and to palce it just above the (existing) Comment rule:

DocComment
  :  '/**' (options {greedy=false;} : .)* '*/'
  ;

Comment
  :  '/*' (options {greedy=false;} : .)* '*/' {$channel=HIDDEN;}
  ;

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow