Question

I have 4 types of nodes: S, G, R and C

S nodes have an idStr property that identifies them.

Every node of type G uses just a S node: (:G)-[:USES]->(:S)

Every node of type C may be connected to multiple R or G nodes: (:C)-[:CONNECTED_TO]->(:R|:G)

Every node of type R may be connected to multiple R or G nodes: (:R)-[:CONNECTED_TO]->(:R|:G)

Question:

Given an idStr range, I want to get all R and C nodes that are connected (directly or indirectly) only to G nodes that use S nodes with an idStr in that range.

The closest approach I have achieved is:

MATCH (a:S)<-[:USES]-(b:G)<-[:CONNECTED_TO*]-(n:C)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(DISTINCT b) AS GroupGs
MATCH p=(n)-[:CONNECTED_TO*]->(c:G)
WITH FILTER(x IN NODES(p) WHERE NOT x:G) AS cs,GroupGs,COLLECT(c) AS gs
WHERE ALL(x IN gs WHERE x IN GroupGs)
RETURN cs

but still some nodes that are connected to G nodes that use S nodes not in the range are being returned. [Neo4j Console Test]

What am I trying to do?

First match is used to get two things: G nodes that use S nodes with idStr in the given range (GroupGs) and the C nodes that are connected to those G nodes.

Once we get that, we have to check if those C nodes are connected to more G nodes (directly or through R nodes). That is the second match.

Now we have to check for each C node if all the G nodes connected to it (directly or through R nodes) are in the GroupGs range. If it is so, that C node (and the R nodes in the paths to the G nodes) are a match, and that is what I am trying to get.

Second approach (suggested by @FrobberOfBits)

Trying to use just one match, so we are sure the n node is the same in the matching:

MATCH (a:S)<-[:USES]-(b:G)<-[:CONNECTED_TO*]-(n:C), p=(n)-[:CONNECTED_TO*]->(c:G)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(DISTINCT b) AS GroupGs, FILTER(x IN NODES(p) WHERE NOT x:G) AS cs,COLLECT(c) AS gs
WHERE ALL(x IN gs WHERE x IN GroupGs)
RETURN cs

The result is the same. [Neo4j Console Test]

Third approach (suggested by @FrobberOfBits)

Giving semantics to the problem, C may be an endpoint in a network, R a repeater, G a gateway and S a Sim card.

Sim nodes have an iccid property that identifies them.

Every node of type Gateway uses just a Sim node: (:Gateway)-[:USES]->(:Sim)

Every node of type Endpoint may be connected to multiple Repeater or Gateway nodes: (:Endpoint)-[:CONNECTED_TO]->(:Repeater|:Gateway)

Every node of type Repeater may be connected to multiple Repeater or Gateway nodes: (:Repeater)-[:CONNECTED_TO]->(:Repeater|:Gateway)

I am trying to get all the Repeater and Endpoint nodes that are just connected to Gateway nodes that are using Sim nodes whose iccid are in a range.

Any idea about what am I doing wrong?

Was it helpful?

Solution 2

I think I finally got it:

MATCH (a:S)<-[:USES]-(b:G)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(b) AS GroupGs
MATCH (c)-[:CONNECTED_TO*]->(d:G)
WHERE NOT d IN GroupGs
WITH COLLECT(c) AS badCandidates,GroupGs
MATCH (e)-[:CONNECTED_TO*]->(f:G)
WHERE NOT e IN badCandidates AND f IN GroupGs
RETURN e

First I get GroupGs: all the G nodes that use a S node with an idStr property in the given range.

Now I collect all the C and R nodes that are connected to a G node not in the GroupGs and I call them badCandidates.

Finally, I get all the C and R nodes that are not in the badCandidates collection and are connected to a G node in the GroupGs.

Here you have an example: [Neo4j Console Test]

I hope this helps someone.

OTHER TIPS

Your query is really confusing things with the variables you choose -- binding "a" to label S's, and "b" to label G's? Later binding "c" to "G's" in the second match clause? This query is going to be hard to debug in the future, and makes it hard to see what's going on; consider binding label "G" to "g", or "gs", or similar, and so on.

I think your problem is the second match clause. The (c:G) in the second match clause doesn't relate to anything in the first (which is (b:G)). This means that the path via a set of CONNECTED_TO* relationships from some node to some (c:G) has nothing to do with the complex match on the first line of the query. This second match matches anything labeled G, not just the things you specify in the first match.

That second match is bad because of the requirement you stated:

only to G nodes that use S nodes with an idStr in that range

I don't have your test data, so I can't verify that this works. But here's something to try instead:

MATCH (a:S)<-[:USES]-(b:G)<-[:CONNECTED_TO*]-(n:C),
      p=(n)-[:CONNECTED_TO*]->(b:G)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(DISTINCT b) AS GroupGs,
     FILTER(x IN NODES(p) WHERE NOT x:G) AS cs,GroupGs,COLLECT(c) AS gs
WHERE ALL(x IN gs WHERE x IN GroupGs)
RETURN cs

Apologies if the syntax edited here isn't perfect; this is a complex query and is going to take some fiddling, but I think the placement and mis-labeling of that second MATCH is your issue. My solution may not be perfect and may require tinkering, but should get you there.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top