문제

Terence Parr himself says about antlr3: " Unfortunately, it still seems more difficult to build tokenizer with ANTLR than with a traditional lex-like approach". Where as pygments has lexers for almost any language you can think of: http://pygments.org/languages/

Has anyone tried using a pygments lexer with the antlr python target ? antlr2 had an example of using flex with the cpp target, unfortunately there are no such examples for antlr3.
Can I just hand write a grammarname.tokens file that the antlr parser can import ? When I use a antlr lexer, there are a bunch of anonymous tokens, can i just remove them ? Alternatively maybe pygments can be modified to accept the antlr .tokens file for its tokens. The pygments token stream just needs to implement the antlr token stream interface.

도움이 되었습니까?

해결책 2

this other q/a was very helpful: ANTLR Parser with manual lexer also read through the stax and jflex snippets: http://www.antlr.org/wiki/display/ANTLR3/Interfacing+StAX+to+ANTLR http://www.antlr.org/pipermail/antlr-interest/2007-October/023957.html

the tokens file is a non issue if you import the token types from the generated parser file. Unfortunately i first tried parsing the .tokens file and forgot to convert the token types to integers which caused a long bugchase...

but, I finally figured it out: I figured it out: http://github.com/tinku99/antlr-pygments

다른 팁

Naveen wrote:

Has anyone tried using a pygments lexer with the antlr python target ?

I doubt it. At least, I have never seen anyone mention this either here on SO, or on the ANTLR mailing-lists (which I monitor for quite some time now).

Naveen wrote:

Can I just hand write a grammarname.tokens file that the antlr parser can import ?

No. The parser expects an instance of a Lexer object, which is present in the (Python) runtime. A .tokens file is not supposed to be edited by hand.

Naveen wrote:

When I use a antlr lexer, there are a bunch of anonymous tokens, can i just remove them ?

Not quite sure what you mean, but removing any of the generated code seems a bad idea to me. If you're referring to the .tokens file, as I mentioned before: it is not supposed to be edited by hand.

I really wouldn't bother trying to "glue" some external lexer-grammar, or complete lexer, into ANTLR. I am pretty sure this will cause you more time to implement than it is to just write the ANTLR lexer grammar yourself. After all: defining the lexer rules is the easiest part of a language in most cases.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top