Use pygments lexer with antl python target

https://stackoverflow.com/questions/7305043

25-10-2019
|

문제

Terence Parr himself says about antlr3: " Unfortunately, it still seems more difficult to build tokenizer with ANTLR than with a traditional lex-like approach". Where as pygments has lexers for almost any language you can think of: http://pygments.org/languages/

Has anyone tried using a pygments lexer with the antlr python target ? antlr2 had an example of using flex with the cpp target, unfortunately there are no such examples for antlr3.
Can I just hand write a grammarname.tokens file that the antlr parser can import ? When I use a antlr lexer, there are a bunch of anonymous tokens, can i just remove them ? Alternatively maybe pygments can be modified to accept the antlr .tokens file for its tokens. The pygments token stream just needs to implement the antlr token stream interface.

해결책 2

this other q/a was very helpful: ANTLR Parser with manual lexer also read through the stax and jflex snippets: http://www.antlr.org/wiki/display/ANTLR3/Interfacing+StAX+to+ANTLR http://www.antlr.org/pipermail/antlr-interest/2007-October/023957.html

the tokens file is a non issue if you import the token types from the generated parser file. Unfortunately i first tried parsing the .tokens file and forgot to convert the token types to integers which caused a long bugchase...

but, I finally figured it out: I figured it out: http://github.com/tinku99/antlr-pygments

다른 팁

Naveen wrote:

Has anyone tried using a pygments lexer with the antlr python target ?

I doubt it. At least, I have never seen anyone mention this either here on SO, or on the ANTLR mailing-lists (which I monitor for quite some time now).

Naveen wrote:

Can I just hand write a grammarname.tokens file that the antlr parser can import ?

No. The parser expects an instance of a Lexer object, which is present in the (Python) runtime. A .tokens file is not supposed to be edited by hand.

Naveen wrote:

When I use a antlr lexer, there are a bunch of anonymous tokens, can i just remove them ?

Not quite sure what you mean, but removing any of the generated code seems a bad idea to me. If you're referring to the .tokens file, as I mentioned before: it is not supposed to be edited by hand.

I really wouldn't bother trying to "glue" some external lexer-grammar, or complete lexer, into ANTLR. I am pretty sure this will cause you more time to implement than it is to just write the ANTLR lexer grammar yourself. After all: defining the lexer rules is the easiest part of a language in most cases.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow