Question

This is an extension from another SO (Neo4j 2.0 Merge with unique constraints performance bug?), but I'm trying it a different way.

MATCH (c:Contact),(a:Address), (ca:ContactAddress)
WITH c,a,collect(ca) as matrix
FOREACH (car in matrix | 
MERGE 
(c {ContactId:car.ContactId})
-[r:CONTACT_ADDRESS {ContactId:car.ContactId,AddressId:car.AddressId}]->
(a {AddressId:car.AddressId}))

So this leads to a locked up Neo4j server. I'm trying to wrap my head around why.
My thought process behind the query is the following:

  • I want to select all Contact and Address nodes (as well as ContactAddress nodes)
  • I want to loop through all ContactAddress nodes (which contain the relationship data between Contact and Address) and related the Contact and Address nodes to each other.

When I run the above code, the server sits at about 40% CPU and memory continues to climb. I stopped it after the browser connected disconnected (myserver:7474/browser), reset my database and tried again, this time using the following:

match (c:Contact),(a:Address), (ca:ContactAddress)
WITH c,a,collect(distinct ca) as matrix
foreach (car in matrix | 
CREATE 
(c {ContactId:car.ContactId})
-[r:CONTACT_ADDRESS {ContactId:car.ContactId,AddressId:car.AddressId}]->
(a {AddressId:car.AddressId}))

Same results. Locked up, disconnected Neo4j database while CPU stays pegged and RAM usage continues to climb. Is there a loop here that I'm not seeing?

I've also tried this (with the same hang):

FOREACH(row in {PassedInList} | 
    MERGE (c:Contact {ContactId:row.ContactId})
    MERGE (a:Address {AddressId:row.AddressId})
    MERGE (c)-[r:CONTACT_ADDRESS]->(a)
    )

RESOLVED:

MATCH (ca:ContactAddress)
MATCH (c:Contact {ContactId:ca.ContactId}), (a:Address {AddressId:ca.AddressId})
MERGE p = (c)
          -[r:CONTACT_ADDRESS {ContactId:ca.ContactId,AddressId:ca.AddressId}]->
          (a)
Was it helpful?

Solution

When you write match (c:Contact),(a:Address), (ca:ContactAddress), with 3 disconnected nodes, then Neo4j will match every possible cartesian product of those 3. If you had 100 of each type of node, that is 100x100x100 = 1000000 results.

Try this:

MATCH (ca:ContactAddress), (c:Contact {ContactId:ca.ContactId}), (a:Address {AddressId:ca.AddressId})
MERGE (c)-[r:CONTACT_ADDRESS {ContactId:ca.ContactId,AddressId:ca.AddressId}]->(a)

That will match every :ContactAddress node, and only the :Contact and :Address nodes that match it. Then it'll create the relationship (if it didn't already exist).

If you want to be clearer, you could also split the MATCH, ie:

MATCH (ca:ContactAddress)
MATCH (c:Contact {ContactId:ca.ContactId}), (a:Address {AddressId:ca.AddressId})
MERGE (c)-[r:CONTACT_ADDRESS {ContactId:ca.ContactId,AddressId:ca.AddressId}]->(a)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top