Question

Could someone please give me some advice/ideas about how to deal with the situations when it's needed to have a look at further declarations to be able to make correct semantic actions on current moment? For example, it is a well-known occurrence when someone writes an interpreter/compiler of some programming language which doesn't support "forward declarations". Let's have an example:

foo(123);//<-- our parser targets here. we estimate we have a function 
         //    invocation, but we have no idea about foo declaration/prototype,
         //     so we can't be sure that "foo" takes one integer argument.   


void foo(int i){
//...
}

It is pretty clear we have to have at least two passes. Firstly we parse all function declarations and get all the needed information such as: the amount arguments the function takes, their types and then we are able to deal with function invocations and resolving the difficulties as above. If we go this way we will must do all these passes using some AST traversing mechanisms/visitors. In this case we have to deal with AST traversing/applying visitors and we must say "good bye" to the all the beauty of phoenix constructions integrated directly in our parsers.

How would you deal with this?

Was it helpful?

Solution

[2nd answer, on semantics] This particular example happens to be simple. What you can do is record function calls made to yet undeclared functions, and the actual argument types. When you do encounter a function declaration later, you check if there are preceding function calls that are (better) matched to this new function. You will obviously detect errors only at the end of the parse, becuase the very last line could introduce a missing function. But after that line, any function call that hasn't been matched at all is an error.

Now, the problem is that this works for simple semantics. If you look at more complex languages - e.g. with C++-like function templates - it no longer becomes possible to do such lookups in a simple table. You would need specialized tabes that structurally match your language constructs. An AST just isn't the best structure for those, let alone the partial AST during parsing.

OTHER TIPS

If you want to do two passes, instead of semantic checking at the end of the first pass, you can have te functions called by your actions know which pass they are in. So if you had some actions

[functionCall(name, args)]
[functionDef(name, args, body)]

They would be defined something like this (not proper spirit syntax, but you get the point)

functionCall(string name, vector<string> args)
{
  if (!first_pass) {
    // check args for validity
    // whatever else you need to do
  }
} 

functionDef(string name, vector<string> args, ... body)
{
  if (first_pass)
    // Add function decleration to symbol table
  else
    // define function
}

I think you're making unfounded assumptions. For instance, "it is pretty clear we have to have at least two passes". No it isn't. If the syntax is such that foo(123) can only be parsed as function-name "(" expression ")", then one pass is enough.

Therefore I would advise to design your syntax for unambiguous parsing. Avoid constructs that cannot be parsed in isolation, , e.g. avoid dependendencies on declarations elesewhere.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top