Is there way I can modify a ParseTree and it's accompanying TokenStream?

https://stackoverflow.com//questions/23022079

21-12-2019
|

Question

My question is both a language implementation question and an ANTLR4 API question. Is there way I can modify a ParseTree and it's accompanying TokenStream?

Here is the scenario. I have a simple language that defines a dataflow program. You can see it on github, if you're curious. I lex and parse the language with ANTLR4. I use listeners to walk the parse tree and evaluate the code.

The problem I have most recently run into is I need to be able to modify the code at runtime. I need to be able to define new objects and create instances from them. Note, I'm not referring to having reflection in the language. I'm referring to having a program like an IDE modify the internal representation of the source code.

I have started off down the path of defining a bunch of definition objects to create an AST, but I just realized this approach will require me to come up with my own solutions for walking the AST. Rather than reinvent the wheel, I'd rather use ANTLR's listeners/visitors.

Another problem I face is the need to be able to output the current state the AST as code at any point in time (The tool I'm embedding the language in needs to be able to save.) I am using StringTemplate to generate the code from my definition objects. I think I should be able to make ST render the parse tree.

In general, I need to able to lex, parse, evaluate, refactor, evaluate, and generate code all from within my runtime.

Rather than create my own definition objects, I'm wondering what the best approach is modify the ParseTree/TokenStreams?

Solution

I checked out your language. It looks pretty simple, and I'm assuming it is.

From your description I'm working on the basis that the IDE will operate directly on the tree. Given that this is so you need:

A parser for your language, to convert source code into tree. ANTLR can do this, but you may need to build your own tree rather than rely on what is provided. Writing your own parser is not that hard.
Tree rewriting rules. A series of permitted transformations of the tree that ensure it remains valid. This can be in the form of an API that operates on the tree. This http://www.program-transformation.org/Transform/TreeRewriting may be helpful.
Source code generation. The ability to reconstruct source code from the tree. If you need to preserve comments and formatting then either (a) include them in the grammar (b) merge generated source code with authored code.

The end result is that you can round-trip any change in either the source code or the generated tree. If you think this is a lot of work, it is. However, it is the basis for many other tools that aim to provide these facilities. The saving grace is that your language is simple.

OTHER TIPS

We have no means to alter parse tree safely at the moment. It's best to derive a new version from the old. You should also look at the token stream rewrite engine thingie. Ter

The short answer to your question is yes, but as you surmise, it will be painful and fragile.

A far more workable alternative, unless you have a hard requirement explicitly to modify the 'internal' representation of the source code/parse tree, is to simply re-parse the modified source code between (virtually) every keystroke. Just grab the current contents of the active editor as a String and feed it as a stream to the Lexer.

While this may sound like an expensive operation, Antlr is actually quite fast. Eclipse in particular works well with this approach - I have used it with a number of DSL editors without any noticeable impact on editor performance. The parse occurs entirely on a background thread. Editor problemMarkers are only updated when there is a sufficient pause in the foreground editing thread. NetBeans should be similar.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow