Question

I'm reading the "Dragon Book" and I think I understand the main point of a lexer, parse tree and syntax tree and what errors they're typically supposed to catch (assuming we're using a context-free language), but I need someone to catch me if I'm wrong. My understanding is that a lexer simply tokenizes the input and catches errors that have to do with invalid constructs in code, such as a semi-colon being passed in language that doesn't contain semi-colons. the parse tree is used to verify that the syntax is followed and the code is in the correct order, and the syntax tree is used to actually evaluate the statements and expressions in the code and generate things like 3-address code or machine code. Are any of these correct?

Side-note: Are a concrete-syntax tree and a parse tree the same thing?

Side-Side note: When constructing the AST, does the entire program get built into one giant AST, or is a different AST constructed per statement/expression?

Was it helpful?

Solution

Strictly speaking, lexers are parsers too. The difference between lexers and parsers is in what they operate on. In the world of a lexer, everything is made of individual characters, which it then tokenizes by matching them to the regular grammar it understands. To a parser, the world is made up of tokens which it makes a syntax tree out of by matching them to the context-free grammar it understands. In this sense, they are both doing the same kind of thing but on different levels. In fact, you could build parsers on top of parsers, operating at higher and higher levels so one symbol in the highest level grammar could represent something incredibly complex at the lowest-level.

To your other questions:

  • Yes, a concrete syntax tree is a parse tree.
  • Generally, parsers make one tree for the entire file, since that represents a sentence in the CFG. This isn't necessarily always the case though.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top