Question

i want to store some data in my neo4j database. i use spring-data-neo4j for that.

my code is like the follow:

    for (int i = 0; i < newRisks.size(); i++) {
        myRepository.save(newRisks.get(i));
        System.out.println("saved " + newRisks.get(i).name);
    }

My newRisks-array contains circa 60000 objects and 60000 edges. Every node and edge has one property. The duration of this loop is circa 15 - 20 minutes, is this normal? I used Java VisualVM to search some bottlenecks, but my average CPU usage was 10 - 25% (of 4 cores) and my heap was less than half full.

There are any options to boost up this operation?


EDIT: additional is, on the first call of myRepository.save(newRisks.get(i)); the jvm falling assleep fpr some minutes before the first output is comming

Second EDIT:

Class Risk:

@NodeEntity
public class Risk {
    //...
    @Indexed
    public String name;

    @RelatedTo(type = "CHILD", direction = Direction.OUTGOING)
    Set<Risk> risk = new HashSet<Risk>();

    public void addChild(Risk child) {
        risk.add(child);
    }

    //...
}

Creating Risks:

@Autowired
private Repository myRepository;

@Transactional
public Collection<Risk> makeSomeRisks() {

    ArrayList<Risk> newRisks = new ArrayList<Risk>();

    newRisks.add(new Risk("Root"));

    for (int i = 0; i < 60000; i++) {
        Risk risk = new Risk("risk " + (i + 1));
        newRisks.get(0).addChild(risk);
        newRisks.add(risk);
    }

    for (int i = 0; i < newRisks.size(); i++) {
        myRepository.save(newRisks.get(i));
    }

    return newRisks;
}
Was it helpful?

Solution

The problem here is that you are doing mass-inserts with an API that is not intended for that.

You create a Risk and 60k children, you first save the root which also persists the 60k children at the same time (and creates the relationships). That's why the first save takes so long. And then you save the children again.

There are some solutions to speed it up with SDN.

  1. don't use the collection approach for mass inserts, persist both participants and use template.createRelationshipBetween(root, child, "CHILD",false);

  2. persist the children first then add all the persisted children to the root object and persist that

  3. As you did, use the Neo4j-Core API but call template.postEntityCreation(node,Risk.class) so that you can access the entities via SDN. Then you also have to index the entities on your own (db.index.forNodes("Risk").add(node,"name",name);) (or use the neo4j core-api auto-index, but that's not compatible with SDN).

  4. Regardless with the core-api or SDN you should use tx-sizes of around 10-20k nodes/rels for best performance

OTHER TIPS

I think I've found a solution:

I tried the same insert using the nativ neo4j java API:

GraphDatabaseService graphDb;
Node firstNode;
Node secondNode;
Relationship relationship;

graphDb = new EmbeddedGraphDatabase(DB_PATH);
Transaction tx = graphDb.beginTx();

try {
    firstNode = graphDb.createNode();
    firstNode.setProperty( "name", "Root" );

    for (int i = 0; i < 60000; i++) {
        secondNode = graphDb.createNode();
        secondNode.setProperty( "name", "risk " + (i+1));

        relationship = firstNode.createRelationshipTo( secondNode, RelTypes.CHILD );
    }
    tx.success();
}
finally {
    tx.finish();
    graphDb.shutdown();
}

the result: after some sconds, the database is filled with risks.

Maybe the reflections slow down this routine with spring-data-neo4j. @Michael Hunger says somthing like that in his book GoodRelationships, thanks for that tip.

Do inserts into your database (outside of Java) have the same delay or is this a problem only through spring data?

I faced the same problem as OP. Really useful in my case was to change Neo4j usage from remote server mode to embedded. Good example of embedded SDN usage could be found here.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top