Question

I'm trying to batch import millions of nodes through Py2Neo. I don't know what's faster, the BatchWrite or the cipher.Transaction, but the latter seemed the best option as I need to split my batches. However, when I try to execute a simple transaction, I receive a weird error.

The python code:

session = cypher.Session("http://127.0.0.1:7474/db/data/") #error also w/o /db/data/

def init():
    tx = session.create_transaction()

    for ngram, one_grams in data.items():
         tx.append("CREATE "+str(n)+":WORD {'word': "+ngram+", 'rank': "+str(ngram_rank)+", 'prob': "+str(ngram_prob)+", 'gram': '0gram'}")
         tx.execute()  # line 69 in the error below

The error:

Traceback (most recent call last):
  File "Ngram_neo4j.py", line 176, in <module>
    init(rNgram_file="dataset_id.json")
  File "Ngram_neo4j.py", line 43, in init
    data = probability_items(data)
  File "Ngram_neo4j.py", line 69, in probability_items
    tx.execute()
  File "D:\datasets\GOOGLE~1\virtenv\lib\site-packages\py2neo\cypher.py", line 224, in execute
    return self._post(self._execute or self._begin)
  File "D:\datasets\GOOGLE~1\virtenv\lib\site-packages\py2neo\cypher.py", line 209, in _post
    raise TransactionError(error["code"], error["status"], error["message"])
KeyError: 'status'

I tried catching the exception:

 except cypher.TransactionError as e:
        print("--------------------------------------------------------------------------------------------")
        print(e.status)
        print(e.message)

But never gets called. (maybe an error on my part?)

Regular insert using graph_db.create({"node:" node}) do work, but are incredibly slow (36hrs for 2.5M nodes) Note that the dataset consists of a series of JSON files, each with a structure to 5 levels deep. I'd like to batch the last 2 levels (around 100 to 20.000 nodes per batch)

--- EDIT ---

I'm using Py2Neo 1.6.1, Neo4j 2.0.0. Currently on Windows 7 (but also OSX Mav., CentOS 6)

Was it helpful?

Solution

The problem you're seeing is due to a last minute alteration in the way that Cypher transaction errors are reported by the Neo4j server. Py2neo 1.6 was built against M05/M06 and when a few features changed in RC1/GA, Py2neo broke in a few places.

This has been fixed for Py2neo 1.6.2 (https://github.com/nigelsmall/py2neo/issues/224) but I do not yet know when I will get a chance to finish and release this version.

OTHER TIPS

What neo4j and py2neo versions are you using?

You should use parameters for your create statements.

Can you check the server logs in data/logs and data/graph.db/messages.log for errors?

If you have so much data to insert then perhaps direct batch-insertion would make more sense?

See: http://neo4j.org/develop/import

Two tools I wrote for this:

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top