Use pygments lexer with antl python target
문제
Terence Parr himself says about antlr3: " Unfortunately, it still seems more difficult to build tokenizer with ANTLR than with a traditional lex-like approach". Where as pygments has lexers for almost any language you can think of: http://pygments.org/languages/
Has anyone tried using a pygments lexer with the antlr python target ?
antlr2 had an example of using flex with the cpp target, unfortunately there are no such examples for antlr3.
Can I just hand write a grammarname.tokens file that the antlr parser can import ?
When I use a antlr lexer, there are a bunch of anonymous tokens, can i just remove them ?
Alternatively maybe pygments can be modified to accept the antlr .tokens file for its tokens. The pygments token stream just needs to implement the antlr token stream interface.
해결책 2
this other q/a was very helpful: ANTLR Parser with manual lexer also read through the stax and jflex snippets: http://www.antlr.org/wiki/display/ANTLR3/Interfacing+StAX+to+ANTLR http://www.antlr.org/pipermail/antlr-interest/2007-October/023957.html
the tokens file is a non issue if you import the token types from the generated parser file. Unfortunately i first tried parsing the .tokens file and forgot to convert the token types to integers which caused a long bugchase...
but, I finally figured it out: I figured it out: http://github.com/tinku99/antlr-pygments
다른 팁
Naveen wrote:
Has anyone tried using a pygments lexer with the antlr python target ?
I doubt it. At least, I have never seen anyone mention this either here on SO, or on the ANTLR mailing-lists (which I monitor for quite some time now).
Naveen wrote:
Can I just hand write a grammarname.tokens file that the antlr parser can import ?
No. The parser expects an instance of a Lexer
object, which is present in the (Python) runtime. A .tokens
file is not supposed to be edited by hand.
Naveen wrote:
When I use a antlr lexer, there are a bunch of anonymous tokens, can i just remove them ?
Not quite sure what you mean, but removing any of the generated code seems a bad idea to me. If you're referring to the .tokens
file, as I mentioned before: it is not supposed to be edited by hand.
I really wouldn't bother trying to "glue" some external lexer-grammar, or complete lexer, into ANTLR. I am pretty sure this will cause you more time to implement than it is to just write the ANTLR lexer grammar yourself. After all: defining the lexer rules is the easiest part of a language in most cases.