Blulbflow Neo4j Graph Database Slow

Question 1

I don't see any transaction code in there, establishing one, or signaling transaction success. You should look into that -- if you're doing one transaction for every single node creation, that's going to be slow. You should probably create one transaction, insert thousands of nodes, then commit the whole batch.

I'm not familiar with bulbs, so I can't tell you how to do that with this python framework, but here is a place to start: this page suggests you can use a coding style like this, with some python/neo bindings:

with db.transaction:
  foo()

also, if you're trying to load mass amounts of data and you need performance, you should check this page for information on bulk importing. It's unlikely that doing it in your own script is going to be the most performant. You might instead consider using your script to generate cypher queries, which get piped to the neo4j-shell.

Finally a thing to consider is indexes. Looks like you're indexing on crsqid - if you get rid of that index, creates may go faster. I don't know how your IDs are distributed, but it might be better to break records up into batches to test if they exist, rather than using the get_or_create() pattern.

Question 2

Batch loading 500k nodes individually via REST is not ideal. Use Michael's batch loader or the Gremlin shell -- see Marko's movie recommendation blog post for an example of how to do this from the Gremlin shell.