Your favourite Abstract Syntax Tree optimization

https://stackoverflow.com/questions/1825298

22-07-2019
|

Question

If you were constructing a compiler, what optimization at the AST level would be the nicest to have?

Solution

An optimisation that is easiest to do on the AST (rather than, say, the CFG) is tail-call optimisation: if you see a subtree of the form:

RETURN
    CALL f
        ARGS x, y, ...

You can replace it with a jump to f. If f(a, b) is the function that the tail-call appears in, the replacement is as simple as:

a = x; b = y
JUMP to root of tree

I find it easiest to represent that jump as a special "restart" statement, which the AST->CFG translation treats as an edge back to the first node. Jumping to other functions is a bit trickier since you can't just set local variables, you need to actually think ahead how arguments are passed to them and prepare to translate this at a lower level. For example, the AST might need a special node that can instruct a later pass to overwrite the current stack frame with the arguments and jump accordingly.

OTHER TIPS

Mostly you can't do interesting optimizations at the AST level, because you need information how how data flows from one part of the program to another. While data flow is implicit in the meaning of the AST, it isn't easily determined by inspecting just the AST, which is why people building compilers and optimizers build other program representations (including symbol tables, control flow graphs, reaching definitions, data flow and SSA forms, etc.).

Having a parser for a language is the easy part of analyzing/manipulating that language. You need all that other stuff to do a good job.

If you do have all those other representations, you can think about doing optimizations at the AST level. Most folks building compilers don't bother; they convert to a data flow representation and simply optimize that. But if you want to reproduce source code with changes, you need the AST. You'll also need a prettyprinter to enable you to regenerate the source code. If you go this far, you'll end up with a source-to-source program transformation system.

The DMS Software Reengineering Toolkit is a system that transforms ASTs, using all these other representations to enable the analyses needed by the transforms.

One advantage of applying optimizations in the AST is that it may reduce the execution time of some back-end optimization pass. However, I believe that these optimizations need to be done with parsimony because you may be hindering further optimizations in the code.

That being said, IMO one simple yet interesting optimization to be applied at AST-level or during IR generation is the simplification of boolean expression of the form (1 || 2) or (2 < 3 || A), etc. to they net result. I believe that some simple peephole optimizations may be worthwhile too.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow