What are the general strategies for reducing a parse tree (ie. concrete syntax tree) into an abstract syntax tree?

For example, I have the following grammar rule:

statement_list : statement
               | statement_list statement

which, if left as a parse tree, will generate fanning output that looks like

program
        statement_list
                statement_list
                        statement
                                definition
                                        p_type
                                        assignment
                statement
                        definition
        statement
                assign
                        assignment

If I concatenate the children of each node (since a statement list has no inherent meaning after parsing), I can achieve the following

program
        definition
                p_type
                assignment
        definition
        assign
                assignment

This worked well - however, I'm unaware of any "rules" for doing this. Are there specific grammar rules I should be looking to simplify? Is it a matter of feel, or is there a more mechanistic process?

有帮助吗?

解决方案

It's not a matter of "feel". An abstract syntax tree depends on the meaning (semantics) of what's been parsed, and I think these would be the rules:

  1. Remove nodes for tokens that don't add meaning. Those are intermediate keywords (like "then"), separators (like comma) and brackets (like parenthesis).
  2. Promote meaningful tokens (like "if") to be the parent of other tokens in the same rule.

There's no single recipe. It depends on what the phrases in the target language mean.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top