Abstract Syntax Tree - Compiler Phases

Question

Although this is an old question, it is standard computer science compiler construction course material, so perhaps needs an answer recorded.

Your question does not have one single answer, because there is not one single way of implementing a compiler. There is no perfect and correct answer, which is what many students believe there should always be. Examiners often ask this kind of question for students to show understanding and understanding cannot be demonstrated by the rote memory of a single answer, which is what I believe is being sought for here.

Compilers can be implemented as single pass or multiple pass; an abstract syntax tree can be an in-memory data structure or it could be recorded in some more substantive way by being output into a file, which in the earliest compilers had a physical representation by being punched on to paper tape or cards! If the data that moved between the compiler phases (be it tokens, abstract-syntax-trees, intermediate code or target code) was output thus would the set of cards holding one abstract-syntax-tree and the set of cards holding the other abstract-syntax-tree be different trees or an original tree and another version of that tree? Even then, philosophically, you could argue it either way.

In modern compilers the data may (often does) move between the phases along pipes between multiple concurrent tasks forming the different components of the compiler; then again we have one tree in on one pipe and another tree out on another pipe.

However, if we implemented the compiler in a single program, running in a single process, then we need not have one tree in and one tree out. All parts of the compiler can see all the storage. Once the syntax analysis is complete there is a tree structure in memory that can be processed by the semantic analyser. As the syntax analyser will never be needed again, and the semantic analyser will never run again once it has processed the syntax tree, there is no need to make another tree. It is more efficient to walk the tree in memory, as a data structure, and modify the tree in situ. In your example, the node intofloat is inserted in the tree by the semantic analyser to represent the type conversion required at this point in the code. Therefore in this case we might say we have one single tree which is manipulated by all phases of the compiler.

One should now consider the intofloat node that has been added. You may have assumed that it is a new type or attribute to a tree node object. That may not be the case. It could be considered a form of operator like the =, + and * nodes that already exist. It may already be in the set of operators available. All the semantic analysis has done is to add explicitly an operator to the tree that could have been included explicitly in the code in the first place. No new form of node is actually created, only another instance of an existing node type. It could be the way you imagined it, or it could not. Neither version is the right or wrong way.

The question of visitor patterns is orthogonal to the compiler question. Visitor patterns are mainly a feature of object-oriented programming and languages. You could implement the semantic analyser using a visitor pattern on the tree or you could not, as indicated in the last paragraph. It could be done either way, and again there is no one correct answer. It is the discussing of the role of visitor patterns in such a question that can be used to demonstrate both the understanding of visitor patterns and their application, and the purpose and role of abstract-syntax-trees. Thus if you are looking for the one right answer (yes/no) it does not exist. Only the debate exists.

Through the debate and comparison of what the words in the question mean we can show understanding.

If this is part of a course assignment of examination, it is the demonstration of understanding that is sought, not the one magic answer.