Can I perform this rearrangement during tree construction, or should I write a tree grammar?
It's possible to do either, but I recommend grouping the nodes during token parsing.
I haven't been happy with any tree-rewriting grammars I've written that group nodes because those grammars have to rediscover where each groupable node is at -- hence the need for grouping. The token parser touches all that data during regular processing, and the tree grammar ends up walking the tree for those nodes exactly like the token parser already walked its input for tokens. I don't think the tree parser is worth the hassle if it's just for grouping.
Anyway, managing the grouping in the parser boils down to saving off the decl
and assign
nodes after they're produced then pushing them out again when their grouping level occurs. Here's a quick example.
Declarative.g
grammar Declarative;
options {
output = AST;
}
tokens {
PROGRAM; DECLARATIONS; ASSIGNMENTS;
}
@parser::header {
import java.util.ArrayList;
}
@members {
private ArrayList<Object> assigns = new ArrayList<Object>();
private ArrayList<Object> decls = new ArrayList<Object>();
private Object createTree(int ttype, ArrayList<Object> children) {
Object tree = adaptor.create(ttype, tokenNames[ttype]);
for (Object child : children){
adaptor.addChild(tree, child);
}
return tree;
}
}
compilationUnit : statement* EOF -> ^(PROGRAM {createTree(DECLARATIONS, decls)} {createTree(ASSIGNMENTS, assigns)});
statement : decl {decls.add($decl.tree);}
| assign {assigns.add($assign.tree);}
;
decl : DECL^ ID;
assign : ASSIGN^ ID INT;
DECL : 'decl';
ASSIGN : 'assign';
ID : ('a'..'z'|'A'..'Z')('a'..'z'|'A'..'Z')*;
INT : ('0'..'9')+;
WS : (' '|'\t'|'\f'|'\n'|'\r'){skip();};
Each decl
node is saved off by the statement
rule in the decls
list, and similarly for for each assign
node.
Method createTree
uses the parser's TreeAdaptor
to build the group nodes and populate them.
CommonTree tree = (CommonTree) adaptor.create(ttype, tokenNames[ttype]);
for (Object child : children){
adaptor.addChild(tree, child);
}
return tree;
The production for compilationUnit
is ^(PROGRAM {createTree(DECLARATIONS, decls)} {createTree(ASSIGNMENTS, assigns)})
, which adds the grouping nodes to PROGRAM
. Method createTree
is used to build the grouping node and its children in one go.
There may be a tricky way to get ANTLR to pull it all together for you, but this works and is fairly self-explanatory.
So given this input...
decl x
assign y 2
assign x 1
decl y
... the token parser produced for the grammar above produces this tree as output:
(PROGRAM
(DECLARATIONS
(decl x)
(decl y))
(ASSIGNMENTS
(assign y 2)
(assign x 1)))