Question

Google's new language "Go" says on its website:

the language has been designed to be easy to analyze and can be parsed without a symbol table

I'm certainly no expert on these matters, but I thought a symbol table was a basic construct common to all compilers for languages that use variables, and Go clearly uses variables. What am I not understanding?

Was it helpful?

Solution

Parsing means just figuring out the program structure: separating the module into statements/declarations, breaking expressions down to sub-expressions, etc. You end up with a tree structure, known as a "parse tree", or "abstract syntax tree" (AST).

Apparently, C++ requires a symbol table to do parsing.

This page discusses some reasons why C++ requires a symbol table for parsing.

Of course, parsing is only a part of compilation, and you will need a symbol table to do a full compilation.

However, parsing itself can be useful in writing analysis tools (e.g. which module imports which modules). So, simplifying the parsing process means it's easier to write code analysis tools.

OTHER TIPS

Interpretation and compilation absolutely require symbol tables or similar. This is true for nearly all languages.

In C and C++, even parsing the language requires a symbol table.

@Justice is right. To expand on that a little, in C the only actual tricky part is telling types apart from variables. Specifically when you see this:

T t;

You need to know that T is a type for that to be a legal parse. That's something you have to look up in a symbol table. This is relatively simple to figure out as long as types are added to the symbol table as the parse continues. You don't need to do much extra work in the compiler: either T is present in the table or it isn't.

In C++ things are much, much more complicated. There are enormous numbers of ambiguous or potentially ambiguous constructs. The most obvious is this one:

B::C (c);

Aside from the fact that it's not clear if B is a class, a typedef, or a namespace, it's also not clear if C is a type and c an object of that type, or if C is a function (or constructor) taking c as an argument (or even if C is an object with operator() overloaded). You need the symbol table to carry on parsing, although it is still possible to continue quickly enough, as the type of the symbol is in the symbol table.

Things get much, much, much worse than that when templates come into the mix. If C (c) is in a template, you might not know in the actual definition of the template, if C is a type or a function/object. That's because the template can declare C to be either a type or a variable. What this means is that you need the symbol table, but you don't have one -- and you can't have one until the template is actually declared. Even worse, it's not necessarily sufficient to have just the type of the symbol: you can come up with situations which require the full information of the type the symbol represents, including size, alignment, and other machine-specific information.

All this has several practical effects. The two most significant I would say are:

  • Compilation is much faster. I assume Go is faster to compile than C, and C++ has famously slow compilation times for situations involving a lot of templates.
  • You can write parsers that don't depend on having a full compiler. This is very useful for doing code analysis and for refactoring.

To parse most languages you need to know when names are variables, types or functions to disambiguate certain constructs. Go has no such ambiguous constructs.

For instance:

int x = Foo(bar);

Foo could be a type or a function and they are represented by different AST types. Basically the parser never has to do lookups on symbols to know how to construct the AST. The grammar and the AST are just simpler than most languages. Pretty cool really.

Symbol tables are slow and generally not needed. So go choose to go away with it. Other functional languages also need none. Fast lookup requires a hash, but to support nested scopes you need to push/pop names onto a stack. Simple symtabs are implemented as linear searched stack, better symtabs as hash with a stack per symbol. But still, search has to be done at run-time.

Interpretation and compilation for lexically scoped languages require absolutely no symbol tables or similar. Only dynamically scoped symbols need symbol tables, and some compilers with strictly typed languages need some kind of internal symbol table to hold the type annotations.

In C and C++, even parsing the language requires a symbol table, because you need to store the types and declarations of globals and functions.

Lexically scoped symbols are not stored in symtab's but as indexed list of names in block frames, as in functional languages. Those indices are computed at compile-time. So run-time access is immediate. Leaving the scope makes those vars inaccessible automatically, so you don't need to push/pop names from namespaces/symtabs.

Not so functional languages without first-class functions often need to store their function names in symbol tables. As language designer you try to bind functions to lexicals instead, to be able get rid of dynamic name lookup in symtabs.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top