Pergunta

I'm trying to write lexer/parser for R6RS, and I'm stuck with datum-skipping comment

Here is some part of my lexer/parser rules:

BOOLEAN: '#t' | '#f' | '#T' | '#F';
NUMBER: DIGIT+; // TODO: incomplete
CHAR: '#\\' CHARNAME | '#\\x' HEXDIGIT+ | '#\\' . ;
STRING: '"' STRELEMENT* '"';
IDENTIFIER: INITIAL SUBSEQUENT* | PERCULIAR_ID;

COMMENT: (';' .*? LINE_ENDING | '#!r6rs' ) -> skip;
NESTED_COMMENT: '#|' (NESTED_COMMENT | ~[#|] | ('|' ~'#') | ('#' ~'|') )* '|#' -> skip;

datum: lexemeDatum
     | compoundDatum;
compoundDatum: list
             | vector
             | bytevector;

// (rest omitted...)

Now, I want to write like skipDatum: '#;' datum -> skip. Unfortunely, parser rule doesn't allow ->skip. Neither SKIPDATUM: '#;' datum -> skip would work because lexer rule can't refernce parser rule.

In my opinion, while "commenting out" is the responsibility of lexer and "constructing datum" is the responsibility of the parser, rule regarding #; needs both.

Here is my current solution:

skipDatum: '#;' datum;

list: '(' (datum|skipDatum)* ')' #ProperListDatum
    | '[' (datum|skipDatum)* ']' #ProperListDatum
    | '(' skipDatum* datum (datum|skipDatum)* '.' skipDatum* datum skipDatum* ')' #ImproperListDatum
    | '[' skipDatum* datum (datum|skipDatum)* '.' skipDatum* datum skipDatum* ']' #ImproperListDatum

While it is working, it seems so ugly; where I really want to write rules using datum, I always have to write like skipDatum* datum skipDatum*

Is there any better solution? Thanks in advance.

Foi útil?

Solução

You could use something like this.

datum
    :   SKIP_DATUM? ...
    ;

SKIP_DATUM : '#;';

This would require you to perform the following check every time you use a DatumContext in the generated code, while simplifying the grammar.

if (ctx.SKIP_DATUM() != null) {
    // handle skipped datum here (return?)
}
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top