Frage

I'm developing a parser for learning purposes that follows some user-defined rules to create ASTs. During AST generation, however, I can't find a way to abstract the tree - all tokens/intermediate node types are maintaned in the final tree structure.

It's easier to explain with an example:

User-defined structures:

struct ASTExpr { };
struct ASTNumber : ASTExpr  { int value; };
struct ASTAddition : ASTExpr { ASTExpr* rhs, lhs; };

User-defined rules:

[0..9]              -> ASTNumber
ASTExpr [+] ASTExpr -> ASTAddition
ASTNumber           -> ASTExpr
ASTAddition         -> ASTExpr

Example input:

1 + 5

Example parsing process/output:

          (ASTExpr)
              |
        (ASTAddition)
              |
      /----------------\
      |       |        |
  (ASTExpr)  [+]   (ASTExpr)
      |                |
 (ASTNumber)      (ASTNumber)
      |                |
     [1]              [5]

As you can see the tree is very complex and not very "abstract". The (...) nodes are classes, the [...] nodes are tokens.

This is what I would prefer:

        (ASTAddition)
              |
      /----------------\
      |                |
 (ASTNumber)      (ASTNumber)

In the desired version of the tree, the tokens and the intermediate ASTExpr nodes are not present.


How can I make my AST tree more "abstract"? Do I have to change the ruleset? What is usually done to ensure that the final AST contains less noise?

War es hilfreich?

Lösung

In the end, there's no unique grammar. Consider even the simple */+- grammar; do you want to specify that the two sides of + can be numbers, products or ratios? It adds up quickly:

  • 1+1
  • 1+1*1
  • 1+1/1
  • 1*1+1
  • 1*1+1*1
  • 1*1+1/1
  • 1/1+1
  • 1/1+1*1
  • 1/1+1/1

Using an intermediate rule keeps the grammar smaller at the expense of a larger AST. But as you noted, you can trivially transform the AST, in effect transforming the underlying grammar as well.

You could cut out pretty much any class at will. [1] ASTAddition [5] would be a valid AST, too, for a grammar with a quite complex ASTAddition but few other rules. What makes sense for you?

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top