AST design: Call is both expression and statement?

https://softwareengineering.stackexchange.com/questions/370450

05-02-2021
|

Pergunta

I'm designing the AST for a compiler and found that I don't actually know how to represent the Call node.

Currently, the relevant part of the AST looks like this (where the arrows denote inheritance):

ASTNode -> Expression  -> Call
        -> Declaration -> Statement -> Call
                       -> TypeDeclaration, etc...

For example,

// this is a declaration 
def f(x: Integer from 0 to 16) - > {
    // This is a call that is also a statement
    print(x)
}

How do I reconcile these two definitions of Call? In this particular grammar, the only expression that is also a statement is Call. The language I'm using only supports single inheritance.

Is there a way to sidestep this problem entirely without compromising on the type safety of a heterogeneous AST?

As a side note, any additional reading you can recommend would be great.

Solução

The AST is a bridge between the concrete syntax and the semantic model of your language. It does not have to conform exactly to the syntax. In particular, it is generally OK if the AST could theoretically represent structures that cannot be rendered in the syntax. (Notably, the reStructured Text markup language is defined in terms of a document object model that is substantially more expressive than the actual syntax.)

Therefore, you could introduce an expression-statement AST node, even though the syntax constrains this expression to be a call. This is sensible if statement-calls and expression-calls do not have to be treated differently in your language.

An example where statements and expressions do have to be treated differently is a function definition in JavaScript. Roughly, the statement function foo() {} declares a foo variable that holds a function object. The same definition in an expression will not introduce a variable in the enclosing scope, and evaluates to that function object.

It could then be sensible to represent this using different types, e.g. FunctionExpression versus FunctionStatement. This does not have to lead to excessive duplication! E.g. the statement variant could contain a FunctionExpression object. Alternatively, the function statement could be immediately desugared in the AST to something like Assign(Var("foo"), Function(...)) where the Function AST node only has to model the expression variant.

So in your case, introducing a CallStatement AST node that merely contains the Call expression AST node could be a viable approach.

Note that type safety is relative, anyway. By using inheritance to model your AST nodes, your type system is allowing arbitrary new AST node types as long as they conform to some interface. E.g. it would be possible to create an expression subtype that serves as an adapter to any declaration, even though that seems to make little sense in your model. If you need stricter guarantees, you will have to avoid open inheritance and use a language with more constrained sum types.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange