Question

I have a graph database with defined nodes. I want to find end nodes, given my start node and the last edge( on which the end node exists ). For example:

A-(knows)->B-(likes)->C-(shares)->D where the (....) is the relationship

Now if I want to ask a question such that given a node 'A' give me the nodes which it directly or indirectly 'shares' with. And that should reply with 'D'. So essentially given a start node, and the end edge I can find the node I want to.

My question is, is this possible in Cypher( or perhaps gremlin ) and if so, whats the right way to do it?

Was it helpful?

Solution 2

The pattern 'start node and last edge are known, find end node' could be stated in Cypher as

START a=node:nodeIndex({indexQueryParam})
MATCH a-[?*]->()-[:SHARES]->d
RETURN d

The ? states that this part of the pattern is optional, and * that it can be of any length. You may not need both, since variable length can mean a length of zero, but keep both in mind as you flesh out your pattern further. Since you don't really want the B node, you don't have to bind it--empty parentheses is fine.

However, this pattern is very general, and depending on your data it can be expensive (it's a bit like fishing with dynamite). You give examples of other relationship types you might expect to find in the graph, KNOWS and LIKES. Anything like that which you can introduce to specify your pattern will make your query perform better.

MATCH a-[?:KNOWS|LIKES*]->()-[:SHARES]->d

or if you know in which order these will occur

MATCH a-[?:KNOWS]->()-[LIKES*]->()-[:SHARES]->d

here the KNOWS part of the pattern is only one deep, while the LIKES part is zero to infinity deep. The KNOWS part is optional, which means that the entire part of the pattern linked to a through this optional part, is also optional.

Finally, it is usually not a good idea to let the variable depth range between zero and infinity. Introduce an upper and or lower limit that makes sense for your data, like so

MATCH a-[?:KNOWS]->()-[LIKES*1..4]->()-[:SHARES]->d

You will have to look at the patterns that you find in (or impose on) your data, and develop your cypher query patterns accordingly, always specifying the cypher pattern as much as you can, leaving only those parts undefined which you want to be filled in by the graph.

OTHER TIPS

Here's the Gremlin with all the same caveats as the accepted answer from jjaderberg (i.e. expensive so trying to limit what you touch in the graph will likely be necessary):

gremlin> g = new TinkerGraph()                                                                                   
==>tinkergraph[vertices:0 edges:0]
gremlin> g.loadGraphML('data/graph-example-2.xml')                                                               
==>null
gremlin> ends=[] as Set;g.v(1).as('x').outE.as('e').inV.sideEffect{v,m->if (m.e.label=="followed_by") {ends<<v}}.loop('x'){it.loops<3}.iterate()                 
==>null
gremlin> ends        
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
==>v[50]
...

To break that statement down a bit, I basically construct a Set called ends within which we will aggregate the unique vertices found at the end of "followed_by" edges. We start from vertex 1 and traverse out on all edges to the vertices...essentially this portion of the Gremlin:

g.v(1).as('x').outE.as('e').inV

Then I sideEffect those vertices into the Set if they came by way of a "followed_by" label. The two parameter step closure for sideEffect is sometimes overlooked in Gremlin I think...you can read more about it here. The statement ends by looping back over the pipeline to traverse out further from these vertices. I force a break of the loop at 3 steps.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top