Frage

I'm creating an API that encapsulates JPA objects with additional properties and helpers. I do not want the users to access the database, because I have to provide certain querying functionality for the consumers of the API.

I have the following:

Node1(w/ attributes) -- > Edge1(w/ attr.) -- > Node2(w/ attr.)

and

Node1(w/ attributes) -- > |
Node2(w/ attributes) -- > |  -- > HyperEdge1(w/ attr.)
Node3(w/ attributes) -- > |

Node/Edge/HyperEdge example

Basically a Node can be of a certain type, which would dictate the kind of attributes available. So I need to be able to query these "paths" depending on different types and attributes.

For example: Start from a Node, and find a path typeA > typeB & attr1 > typeC.

So I need to do something simple, and be able to write the query as a string, or maybe a builder pattern style.

What I have so far, is a visitor pattern set up to traverse the Nodes/Edges/HyperEdges, and this allows for a sort of querying, but it's not very simple, since you have to create a new visitor for new types of queries.

This is my implementation so far:

    ConditionImpl hasMass = ConditionFactory.createHasMass( 2.5 );
    ConditionImpl noAttributes = ConditionFactory.createNoAttributes();

    List<ConditionImpl> conditions = new ArrayList<ConditionImpl>();
    conditions.add( hasMass );
    conditions.add( noAttributes );

    ConditionVisitor conditionVisitor = new ConditionVisitor( conditions );
    node.accept( conditionVisitor );

    List<Set<Node>> validPaths = conditionVisitor.getValidPaths();

The code above, does a query that checks if the starting node has a mass of 2.5 and a linked node (child) has no attributes. The visitor does a condition.check( Node ) and returns a boolean.


Where do I start with creating a querying language for a graph that is simpler? Note: I do not have the option of using an existing graph library and I will have hundreds of thousands of nodes, plus the edges..

War es hilfreich?

Lösung

Personally, I like the idea of the visitor pattern, however it might turn out to expensive to visit all nodes.

Query Interface: If users / other developers are using it, I would use a builder style interface, with readable method names:

    Visitor v = QueryBuilder
                  .selectNodes(ConditionFactory.hasMass(2.5))
                  .withChildren(ConditionFactory.noAttributes())
                  .buildVisitor();
    node.accept(v);
    List<Set<Node>> validPaths = v.getValidPaths();

As pointed out above, this is more or less just syntactic sugar for what you already have (but sugar makes all the difference). I would separate the code for "moving on the graph" (like "check whether visited node fulfills condition" or "check whether connected nodes fulfill condition") from the code that actually checks (or is) a condition. Also, use composites on conditions to build and/or:

    // Select nodes with mass 2.5, follow edges with both conditions fulfilled and check that the children on these edges have no attributes. 
    Visitor v = QueryBuilder
                  .selectNodes(ConditionFactory.hasMass(2.5))
                  .withEdges(ConditionFactory.and(ConditionFactory.freestyle("att1 > 12"), ConditionFactory.freestyle("att2 > 23")) 
                  .withChildren(ConditionFactory.noAttributes())
                  .buildVisitor();

(I used "freestyle" because of missing creativity right now, but the intention of it should be clear) Node that in general this might be two different interfaces in order to not build strange queries.

    public interface QueryBuilder {
         QuerySelector selectNodes(Condition c);
         QuerySelector allNodes();
    }

    public interface QuerySelector {
         QuerySelector withEdges(Condition c);
         QuerySelector withChildren(Condition c);
         QuerySelector withHyperChildren(Condition c);
         // ...
         QuerySelector and(QuerySelector... selectors);
         QuerySelector or(QuerySelector... selectors);

         Visitor buildVisitor();
    }       

Using this kind of syntactic sugar makes the queries readable from the source code without forcing you to implement your own data query language. The QuerySelector implementations would than be responsible for "moving" around the visited nodes whereas the Conditition implementation would check whether the condition match.

The clear downside of this approach is, that you need to foresee most of the queries in interfaces and need to implement them already.

Scalability with number of nodes: You might need to add some kind of index to speed up finding "interesting" nodes. One idea which is popping up is to add (for each index) a layer to the graph in which each nodes models one of the different attribute settings for the "indexed variable". The normal edges could then connect these index nodes with the nodes in the original graph. The hyper edges on the index could then build a network which is smaller to search on. Of course there is still the boring way of storing the index in a map-like structure with a attributeValue -> node mapping. Which probably is much more performant than the idea above anyway.

If you have some kind of Index make sure that the index can as well receive a visitor such that it does not have to visit all nodes in the graph.

Andere Tipps

It sounds like you have all the pieces except some syntactic sugar.

How about an immutable style where you create the whole list above like

Visitor v = Visitor.empty
    .hasMass(2.5)
    .edge()
    .node()
    .hasNoAttributes();

You can create any kind of linear query pattern using this style; and if you add a some extra state you could even do branching queries by e.g. setName("A") and later .node("A") to return to that point of the query.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top