Question

i have recently started with neo4j and graph databases. I am using this Api to make the persistence of my model. I have everything done and working but my problems comes related to efficiency.

So first of all i will talk about the scenary. I have a couple of xml documents which translates to some nodes and relations between the, as i already read that this API still not support a batch insertion, i am creating the nodes and relations once a time.

This is the code i am using for creating a node:

var newEntry = new EntryNode { hash = incremento++.ToString() };
        var result = client.Cypher
            .Merge("(entry:EntryNode {hash: {_hash} })")
            .OnCreate()
            .Set("entry = {newEntry}")
            .WithParams(new
            {
                _hash = newEntry.hash,
                newEntry
            })
            .Return(entry => new
            {
                EntryNode = entry.As<Node<EntryNode>>()
            });

As i get it takes time to create all the nodes, i do not understand why the time it takes to create one increments so fats. I have made some tests and am stuck at the point where creating an EntryNode the setence takes 0,2 seconds to resolve, but once it has reached 500 it has incremented to ~2 seconds. I have also created an index on EntryNode(hash) manually on the console before inserting any data, and made test with both versions, with and without index.

Am i doing something wrong? is this time normal?

EDITED: @Tatham Thanks for the answer, really helped. Now i am using the foreach statement in the neo4jclient to create 1000 nodes in just 2 seconds.

On a related topic, now that i create the nodes this way i wanted to also create relationships. This is the code i am trying right now, but got some errors.

client.Cypher
                .Match("(e:EntryNode)")
                .Match("(p:EntryPointerNode)")
                .ForEach("(n in {set} | " +
                         "FOREACH (e in (CASE WHEN e.hash = n.EntryHash THEN [e] END) " +
                         "FOREACH (p in pointers (CASE WHEN p.hash = n.PointerHash THEN [p] END) "+
                         "MERGE ((p)-[r:PointerToEntry]->(ee)) )))")

                .WithParam("set", nodesSet)
                .ExecuteWithoutResults();

What i want it to do is, given a list of pairs of strings, get the nodes (which are uniques) with the string value as the property "hash" and create a relationship between them. I have tried a couple of variants to do this query but i dont seem to find the solution.

Is this possible?

Was it helpful?

Solution

This approach is going to be very slow because you do a separate HTTP call to Neo4j for every node you are inserting. Each call is then a transaction. Finally, you are also returning the node back, which is probably a waste.

There are two options for doing this in batches instead.

From https://stackoverflow.com/a/21865110/211747, you can do something like this, where you pass in a set of objects and then FOREACH through them in Cypher. This means one, larger, HTTP call to Neo4j and then executing in a single transaction on the DB:

FOREACH (n in {set} | MERGE (c:Label {Id : n.Id}) SET c = n)

http://docs.neo4j.org/chunked/stable/query-foreach.html

The other option, coming soon, is that you will be able to write something like this in Cypher:

LOAD CSV WITH HEADERS FROM 'file://c:/temp/input.csv' AS n
MERGE (c:Label { Id : n.Id })
SET c = n

https://github.com/davidegrohmann/neo4j/blob/2.1-fix-resource-failure-load-csv/community/cypher/cypher/src/test/scala/org/neo4j/cypher/LoadCsvAcceptanceTest.scala

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top