Code optimization - syntax tree vs. intermediate representation

https://cs.stackexchange.com/questions/128183

29-09-2020
|

Question

I'm working on a compiler for my own custom language. As I was reading an article on code optimization, I noticed that it assumed that the intermediate representation of the code had already been formed. Though I haven't yet started writing the optimization section of my compiler, I've been going through it in my head and it seems preferable to have the optimizer operate on the syntax tree before converting to the intermediate representation.

Is there a reason to prefer one approach to the other or is mostly a matter of personal taste?

For example, suppose I have an if block like

if ( some_expression ) {
    do_stuff
}

If it could be recognized by the compiler that some_expression will always evaluate to true and have no side effects, then I could remove its computation simply by pruning the tree.

However, if I had already converted the tree to an intermediate representation, such as an assembly-ish list of simple instructions, the processes of recognizing and resolving the scenario would be (in my imagination since I haven't yet attempted to implemented this) far more complicated.

Solution

These days, the trend is to do optimization with the intermediate representation. Check out LLVM for example:

The LLVM Core libraries provide a modern source- and target-independent optimizer, along with code generation support for many popular CPUs (as well as some less common ones!) These libraries are built around a well specified code representation known as the LLVM intermediate representation ("LLVM IR"). The LLVM Core libraries are well documented, and it is particularly easy to invent your own language (or port an existing compiler) to use LLVM as an optimizer and code generator.

LLVM can provide a source-independent optimizer because it is performs its optimization passes over the code only after it is represented in LLVM IR.

Why might it be far more complicated to do this? It depends on your intermediate representation. One of the design goals for the intermediate representation would be to facilitate various optimizations rather than make them needlessly complicated. See all the optimizations that LLVM can achieve working on the LLVM IR, for example, in this list of analysis and transform passes.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange