AST design: Call is both expression and statement?
https://softwareengineering.stackexchange.com/questions/370450
-
05-02-2021 - |
Pergunta
I'm designing the AST for a compiler and found that I don't actually know how to represent the Call
node.
Currently, the relevant part of the AST looks like this (where the arrows denote inheritance):
ASTNode -> Expression -> Call
-> Declaration -> Statement -> Call
-> TypeDeclaration, etc...
For example,
// this is a declaration
def f(x: Integer from 0 to 16) - > {
// This is a call that is also a statement
print(x)
}
How do I reconcile these two definitions of Call
? In this particular grammar, the only expression that is also a statement is Call
. The language I'm using only supports single inheritance.
Is there a way to sidestep this problem entirely without compromising on the type safety of a heterogeneous AST?
As a side note, any additional reading you can recommend would be great.
Solução
The AST is a bridge between the concrete syntax and the semantic model of your language. It does not have to conform exactly to the syntax. In particular, it is generally OK if the AST could theoretically represent structures that cannot be rendered in the syntax. (Notably, the reStructured Text markup language is defined in terms of a document object model that is substantially more expressive than the actual syntax.)
Therefore, you could introduce an expression-statement AST node, even though the syntax constrains this expression to be a call. This is sensible if statement-calls and expression-calls do not have to be treated differently in your language.
An example where statements and expressions do have to be treated differently is a function definition in JavaScript. Roughly, the statement function foo() {}
declares a foo
variable that holds a function object. The same definition in an expression will not introduce a variable in the enclosing scope, and evaluates to that function object.
It could then be sensible to represent this using different types, e.g. FunctionExpression
versus FunctionStatement
. This does not have to lead to excessive duplication! E.g. the statement variant could contain a FunctionExpression
object. Alternatively, the function statement could be immediately desugared in the AST to something like Assign(Var("foo"), Function(...))
where the Function
AST node only has to model the expression variant.
So in your case, introducing a CallStatement
AST node that merely contains the Call
expression AST node could be a viable approach.
Note that type safety is relative, anyway. By using inheritance to model your AST nodes, your type system is allowing arbitrary new AST node types as long as they conform to some interface. E.g. it would be possible to create an expression subtype that serves as an adapter to any declaration, even though that seems to make little sense in your model. If you need stricter guarantees, you will have to avoid open inheritance and use a language with more constrained sum types.