Defining Function Signatures in a Simple Language Grammar

https://stackoverflow.com/questions/13261710

27-11-2021
|

Question

I am currently learning how to create a simple expression language using Irony. I'm having a little bit of trouble figuring out the best way to define function signatures, and determining whose responsibility it is to validate the input to those functions.

So far, I have a simple grammar that defines the basic elements of my language. This includes a handful of binary operators, parentheses, numbers, identifiers, and function calls. The BNF for my grammar looks something like this:

<expression> ::= <number> | <parenexp> | <binexp> | <fncall> | <identifier>
<parenexp>   ::= ( <expression> )
<fncall>     ::= <identifier> ( <argumentlist> )
<binexp>     ::= <expression> <binop> <expression>
<binop>      ::= + - * / %
... the rest of the grammar definition

Using the Irony parser, I am able to validate the syntax of various input strings to make sure they conform to this grammar:

x + y / z * AVG(a + b, p)   -> Valid Syntax
x +/ AVG(x                  -> Invalid Syntax

All that is well and good, but now I want to go a step further and define the available functions, along with the number of parameters that each function requires. So for example, I want to have a function FOO that accepts one parameter and BAR that accepts two parameters:

FOO(a + b) * BAR(x + y, p + q)    -> Valid
FOO(a + b, 13)                    ->  Invalid

When the second statement is parsed, I'd like to be able to output an error message that is aware of the expected input for this function:

Too many arguments specified for function 'FOO'

I don't actually need to evaluate any of these statements, only validate the syntax of the statements and determine if they are valid expressions or not.

How exactly should I be doing this? I know that technically I could simply add the functions to the grammar like so:

<foofncall> ::= FOO( <expression> )
<barfncall> ::= BAR( <expression>, <expression> )

But something about this doesn't feel quite right. To me it seems like the grammar should only define a generic function call, and not every function available to the language.

How is this typically accomplished in other languages?
What are the components called that should handle the responsibilities of analyzing the basic syntax of the language grammar versus the more specific elements like function definitions? Should both responsibilities be handled by the same component?

Solution

While you can do typechecking in directly in the grammar so its enforced in the parser, its generally a bad idea to do so. Instead, the parser should just parse the basic syntax, and separate typechecking code should be used for typechecking.

In the normal case of a compiler, the parser just produces an abstract syntax tree or some equivalent representation of the program. Then, a typechecking pass is run over the AST that ensures all types match appropriately -- ensures that functions have the right number of arguments and those arguments have the right type, as well as ensuring that variables have the right type for what is assigned to them and how they are used.

Besides being generally simpler, this usually allows you to give better error messages -- instead of just 'Invalid', you can say 'too many arguments to FOO' or what have you.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow