Does a parser or a lexer generate a symbol table?

https://stackoverflow.com/questions/21786069

11-10-2022
|

Question

I'm taking a compilers course and I'm recapping the introduction. It's a general overview of how the compiler process works.

I'm a bit confused however.

In my course it states: "in addition a lexical analyzer will typically access the symbol table to store/get information on certain source language concepts". So this leads me to believe that a lexer will actually build a symbol table. The way I see it he creates tokens and stores the min a table and states what type of symbol it is. Like "x -> VARIABLE", for example.

Then again, when reading through Google hits and I can only seem to find vague information about the fact that the parser generates this? But the parsing phase comes after the lexer phase. So I'm a bit confused.

Symbol Table Population after parsing; Compiler building (States that the parser populates the table)

http://www.cs.dartmouth.edu/~mckeeman/cs48/mxcom/doc/Symbols.html Says "The symbol table is built by walking the syntax tree.". The syntax tree is generated by the parser, right? (Parse tree). So how can the lexer, which runs before the parser use this symbol table?

I understand that a lexer can not know the scope of a variable and other information that is contained within a symbol tabe. Therefore I understand that the parser will add this information to the table. However, a lexer does know wether a word is a variable, declaration keyword etc. Thus it should be able to build up a partial (?) symbol table. So could it perhaps be that they each build part of the symbol table?

Solution

I think part of the confusion stems from the fact that "symbol table" means different things to different people, and potentially at different stages in the compilation process.

It is generally agreed that the lexer splits the input stream into tokens (sometimes referred to as lexemes or terminals). These, as you say, can be categorized as different types, numbers, keywords, identifiers, punctuation symbols, and so on.

The lexer may store the recognized identifier tokens in a symbol table, but since the lexer typically does not know what an identifier represents, and since the same identifier can potentially mean different things in different compilation scopes, it is often the parser - which has more contextual knowledge - that is responsible for building the symbol table.

However, in some compiler designs the lexer simply builds a list of tokens, which is passed on to the parser (or the parser requests tokens from the input stream on demand), and the parser in turn generates a parse tree (or sometimes an abstract syntax tree) as the output, and then the symbol table is built only after parsing has completed for a certain compilation unit, by traversing the parse tree.

Many different designs are possible.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow