ANTLR - grandchild nodes in tree construction

Question 1

I think that there is an easier answer than the other one here:

token { DECLS; ASSIGNS; }

prog: (d+=decl | a+=assign)* EOF -> ^(DECLS $d*) ^(ASSIGNS $a*) ;

...

Which can be adapted for as many rules as you like of course.

However, are you sure you need to do this? Why not just build the symbol table of DECL instructions in the parser, and then only build an AST of ASSIGNs, which you can check in the tree walk.

Jim

Question 2

Can I perform this rearrangement during tree construction, or should I write a tree grammar?

It's possible to do either, but I recommend grouping the nodes during token parsing.

I haven't been happy with any tree-rewriting grammars I've written that group nodes because those grammars have to rediscover where each groupable node is at -- hence the need for grouping. The token parser touches all that data during regular processing, and the tree grammar ends up walking the tree for those nodes exactly like the token parser already walked its input for tokens. I don't think the tree parser is worth the hassle if it's just for grouping.

Anyway, managing the grouping in the parser boils down to saving off the decl and assign nodes after they're produced then pushing them out again when their grouping level occurs. Here's a quick example.

Declarative.g

grammar Declarative;

options { 
    output = AST;
}

tokens { 
    PROGRAM; DECLARATIONS; ASSIGNMENTS;
} 

@parser::header { 
    import java.util.ArrayList;
}

@members {
    private ArrayList<Object> assigns = new ArrayList<Object>(); 
    private ArrayList<Object> decls = new ArrayList<Object>();
    
    private Object createTree(int ttype,  ArrayList<Object> children) {
        Object tree = adaptor.create(ttype, tokenNames[ttype]);
        for (Object child : children){
            adaptor.addChild(tree, child);
        }
        
        return tree; 
    }
}

compilationUnit     : statement* EOF -> ^(PROGRAM {createTree(DECLARATIONS, decls)} {createTree(ASSIGNMENTS, assigns)}); 

statement           : decl {decls.add($decl.tree);}
                    | assign {assigns.add($assign.tree);}
                    ;
                    
decl                : DECL^ ID;
assign              : ASSIGN^ ID INT;
                    
DECL    : 'decl';
ASSIGN  : 'assign';
ID      : ('a'..'z'|'A'..'Z')('a'..'z'|'A'..'Z')*;
INT     : ('0'..'9')+;                  
WS      : (' '|'\t'|'\f'|'\n'|'\r'){skip();};

Each decl node is saved off by the statement rule in the decls list, and similarly for for each assign node.

Method createTree uses the parser's TreeAdaptor to build the group nodes and populate them.

CommonTree tree = (CommonTree) adaptor.create(ttype, tokenNames[ttype]);
for (Object child : children){
    adaptor.addChild(tree, child);
}

return tree;

The production for compilationUnit is ^(PROGRAM {createTree(DECLARATIONS, decls)} {createTree(ASSIGNMENTS, assigns)}), which adds the grouping nodes to PROGRAM. Method createTree is used to build the grouping node and its children in one go.

There may be a tricky way to get ANTLR to pull it all together for you, but this works and is fairly self-explanatory.

So given this input...

decl x
assign y 2
assign x 1
decl y

... the token parser produced for the grammar above produces this tree as output:

(PROGRAM 
    (DECLARATIONS 
        (decl x) 
        (decl y)) 
    (ASSIGNMENTS 
        (assign y 2) 
        (assign x 1)))