Personally, I like the idea of the visitor pattern, however it might turn out to expensive to visit all nodes.
Query Interface: If users / other developers are using it, I would use a builder style interface, with readable method names:
Visitor v = QueryBuilder
.selectNodes(ConditionFactory.hasMass(2.5))
.withChildren(ConditionFactory.noAttributes())
.buildVisitor();
node.accept(v);
List<Set<Node>> validPaths = v.getValidPaths();
As pointed out above, this is more or less just syntactic sugar for what you already have (but sugar makes all the difference). I would separate the code for "moving on the graph" (like "check whether visited node fulfills condition" or "check whether connected nodes fulfill condition") from the code that actually checks (or is) a condition. Also, use composites on conditions to build and/or:
// Select nodes with mass 2.5, follow edges with both conditions fulfilled and check that the children on these edges have no attributes.
Visitor v = QueryBuilder
.selectNodes(ConditionFactory.hasMass(2.5))
.withEdges(ConditionFactory.and(ConditionFactory.freestyle("att1 > 12"), ConditionFactory.freestyle("att2 > 23"))
.withChildren(ConditionFactory.noAttributes())
.buildVisitor();
(I used "freestyle" because of missing creativity right now, but the intention of it should be clear) Node that in general this might be two different interfaces in order to not build strange queries.
public interface QueryBuilder {
QuerySelector selectNodes(Condition c);
QuerySelector allNodes();
}
public interface QuerySelector {
QuerySelector withEdges(Condition c);
QuerySelector withChildren(Condition c);
QuerySelector withHyperChildren(Condition c);
// ...
QuerySelector and(QuerySelector... selectors);
QuerySelector or(QuerySelector... selectors);
Visitor buildVisitor();
}
Using this kind of syntactic sugar makes the queries readable from the source code without forcing you to implement your own data query language. The QuerySelector
implementations would than be responsible for "moving" around the visited nodes whereas the Conditition
implementation would check whether the condition match.
The clear downside of this approach is, that you need to foresee most of the queries in interfaces and need to implement them already.
Scalability with number of nodes: You might need to add some kind of index to speed up finding "interesting" nodes. One idea which is popping up is to add (for each index) a layer to the graph in which each nodes models one of the different attribute settings for the "indexed variable". The normal edges could then connect these index nodes with the nodes in the original graph. The hyper edges on the index could then build a network which is smaller to search on. Of course there is still the boring way of storing the index in a map-like structure with a attributeValue -> node
mapping. Which probably is much more performant than the idea above anyway.
If you have some kind of Index make sure that the index can as well receive a visitor such that it does not have to visit all nodes in the graph.