ANTLR3 Hetero nodes are not created

https://stackoverflow.com/questions/7658311

06-02-2021
|

Question

I am trying to create a heterogeneous tree based on a sample provided here: http://www.antlr.org/wiki/display/ANTLR3/Tree+construction#Treeconstruction-Heterogeneoustreenodes

I have created a grammar file as follows:

grammar T; 

options { 
  language=CSharp3; 
  ASTLabelType=CommonTree;
  output=AST; 
  TokenLabelType=CommonToken;
  k=3;
} 

tokens { 
  ROOT; 
  UNARY_MIN; 
} 

@lexer::header 
{
  using System;
  using System.Text;
  using System.Collections;
  using System.Collections.Generic;
  using ANTLRSandbox.Criteria;
}

@parser::header 
{
  using System;
  using System.Text;
  using System.Collections;
  using System.Collections.Generic;
  using ANTLRSandbox.Criteria;
}


@parser::namespace { ANTLRSandbox } 
@lexer::namespace { ANTLRSandbox } 


public
parse 
  :  exp EOF -> ^(ROOT<RootNode> exp) 
  ; 

exp 
  :  addExp 
  ; 

addExp 
  :  mulExp (('+'<PlusNode> | '-'<MinusNode>)^ mulExp)* 
  ; 

mulExp 
  :  unaryExp (('*' | '/')^ unaryExp)* 
  ; 

unaryExp 
  :  '-' atom -> ^(UNARY_MIN atom) 
  |  atom 
  ; 

atom 
  :  Number 
  |  '(' exp ')' -> exp 
  ; 

Number 
  :  ('0'..'9')+ ('.' ('0'..'9')+)? 
  ; 

Space  
  :  (' ' | '\t' | '\r' | '\n'){Skip();} 
  ;

And the node classes looks like this:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Text;
using Antlr.Runtime;
using Antlr.Runtime.Tree;

namespace ANTLRSandbox.Criteria
{
  public class RootNode : CommonTree
  {
    public RootNode(int ttype) { }
    public RootNode(int ttype, IToken t) { }
    public RootNode(IToken t) { }
  }
}

Classes PlusNode and MinusNode are identical with RootNode, so I won't post them here.

And here is how I create the actual tree:

    string s = "(12.5 + 56 / -7) * 0.5";

    ANTLRStringStream Input = new ANTLRStringStream(s);
    TLexer Lexer = new TLexer(Input);
    CommonTokenStream Tokens = new CommonTokenStream(Lexer);
    TParser Parser = new TParser(Tokens);

    TParser.parse_return ParseReturn = Parser.parse();
    CommonTree Tree = (CommonTree)ParseReturn.Tree;

The code runs without any error, but when I 'watch' for Tree object, all its nodes are CommonTree type and all breakpoints I have placed in PlusNode, MinusNode, RootNode constructors are missed.

I have followend the sample provided in ANTLR3 wiki page and I couldn't find any sample on the web. I know they intend to drop this approach at some point (found this on ANTLR3 preview notes) but this implementation suits me better (I need to create different objects types based on grammar context).

So ... any hints? Am I missing something? Some option/flag to put it into grammar definition file?

Thanks! D.

Solution

I've never had much luck getting these operators < ... > to work when using inline tree operators (^ for roots and ! for omitting rules). All I can recommend is you use rewrite rules (the ... -> ^(...)) to the right of your parser rules and then only define the custom node, <NodeName>, in the rewrite rule, not on both sides (!) as the Wiki mentioned: I suspect the Wiki info is a bit outdated. I know that such expression-rules are far more readable using the inline operators than with rewrite rules...

I'm not too fluent in C#, so here's a Java demo:

T.g

grammar T; 

options { 
  ASTLabelType=CommonTree;
  output=AST; 
} 

tokens {
  ROOT;
  UNARY_MIN;
}

@members {

  public static class RootNode extends CommonTree {
    public RootNode(Token t) { token=t; }
    public RootNode(int ttype) { super(new CommonToken(ttype, "ROOT")); }
    public RootNode(RootNode node) { super(node); }
    public Tree dupNode() { return new RootNode(this); } 
    public String toString() { return "RootNode=" + token.getText(); }
  }

  public static class MinusNode extends CommonTree {
    public MinusNode(Token t) { token=t; }
    public MinusNode(MinusNode node) { super(node); }
    public Tree dupNode() { return new MinusNode(this); } 
    public String toString() { return "MinusNode=" + token.getText(); }
  }

  public class PlusNode extends CommonTree {
    public PlusNode(Token t) { token=t; }
    public PlusNode(PlusNode node) { super(node); }
    public Tree dupNode() { return new PlusNode(this); } 
    public String toString() { return "PlusNode=" + token.getText(); }
  }
}

parse 
  :  exp EOF -> ^(ROOT<RootNode> exp)
  ; 

exp 
  :  addExp 
  ; 

addExp 
  :  (mulExp -> mulExp) ( '+' m=mulExp -> ^('+'<PlusNode>  $m $addExp)
                        | '-' m=mulExp -> ^('-'<MinusNode> $m $addExp)
                        )* 
  ; 

mulExp 
  :  unaryExp (('*' | '/')^ unaryExp)* 
  ; 

unaryExp 
  :  '-' atom -> ^(UNARY_MIN atom) 
  |  atom 
  ; 

atom 
  :  Number 
  |  '(' exp ')' -> exp 
  ; 

Number 
  :  ('0'..'9')+ ('.' ('0'..'9')+)? 
  ; 

Space  
  :  (' ' | '\t' | '\r' | '\n') {skip();} 
  ;

Main.java

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {

  private static void traverse(CommonTree tree, int indent) {
    if(tree == null) return;
    for(int i = 0; i < indent; i++) System.out.print("  ");
    System.out.println(tree.getClass().getName() + " -> " + tree.getText());
    for(int i = 0; i < tree.getChildCount(); i++) {
      traverse((CommonTree)tree.getChild(i), indent + 1);
    }
  }

  public static void main(String[] args) throws Exception {
    TLexer lexer = new TLexer(new ANTLRStringStream("1 + 2 - 3"));
    TParser parser = new TParser(new CommonTokenStream(lexer));
    CommonTree tree = (CommonTree)parser.parse().getTree();
    traverse(tree, 0);
  }
}

Running the demo:

java -cp antlr-3.3.jar org.antlr.Tool T.g 
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main

will print:

TParser$RootNode -> ROOT
  TParser$MinusNode -> -
    org.antlr.runtime.tree.CommonTree -> 3
    TParser$PlusNode -> +
      org.antlr.runtime.tree.CommonTree -> 2
      org.antlr.runtime.tree.CommonTree -> 1

OTHER TIPS

I just received an answer that works from the main contributor of the CSharp3 target. Basically, when specifying a node type, you have to explicitly use node=; you cannot rely on the implicit behavior as documented. For example, you would need to change this:

parse 
  :  exp EOF -> ^(ROOT<RootNode> exp)
  ;

...to this:

parse 
  :  exp EOF -> ^(ROOT<node=RootNode> exp)
  ;

In my own grammar, once I made this change to my rewrite rules, the parser finally output the heterogeneous nodes.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow