Question

I'm developing a parser for learning purposes that follows some user-defined rules to create ASTs. During AST generation, however, I can't find a way to abstract the tree - all tokens/intermediate node types are maintaned in the final tree structure.

It's easier to explain with an example:

User-defined structures:

struct ASTExpr { };
struct ASTNumber : ASTExpr  { int value; };
struct ASTAddition : ASTExpr { ASTExpr* rhs, lhs; };

User-defined rules:

[0..9]              -> ASTNumber
ASTExpr [+] ASTExpr -> ASTAddition
ASTNumber           -> ASTExpr
ASTAddition         -> ASTExpr

Example input:

1 + 5

Example parsing process/output:

          (ASTExpr)
              |
        (ASTAddition)
              |
      /----------------\
      |       |        |
  (ASTExpr)  [+]   (ASTExpr)
      |                |
 (ASTNumber)      (ASTNumber)
      |                |
     [1]              [5]

As you can see the tree is very complex and not very "abstract". The (...) nodes are classes, the [...] nodes are tokens.

This is what I would prefer:

        (ASTAddition)
              |
      /----------------\
      |                |
 (ASTNumber)      (ASTNumber)

In the desired version of the tree, the tokens and the intermediate ASTExpr nodes are not present.


How can I make my AST tree more "abstract"? Do I have to change the ruleset? What is usually done to ensure that the final AST contains less noise?

Était-ce utile?

La solution

In the end, there's no unique grammar. Consider even the simple */+- grammar; do you want to specify that the two sides of + can be numbers, products or ratios? It adds up quickly:

  • 1+1
  • 1+1*1
  • 1+1/1
  • 1*1+1
  • 1*1+1*1
  • 1*1+1/1
  • 1/1+1
  • 1/1+1*1
  • 1/1+1/1

Using an intermediate rule keeps the grammar smaller at the expense of a larger AST. But as you noted, you can trivially transform the AST, in effect transforming the underlying grammar as well.

You could cut out pretty much any class at will. [1] ASTAddition [5] would be a valid AST, too, for a grammar with a quite complex ASTAddition but few other rules. What makes sense for you?

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top