py2neo Cypher transactions failing
質問
I'm trying to batch import millions of nodes through Py2Neo.
I don't know what's faster, the BatchWrite
or the cipher.Transaction
, but the latter seemed the best option as I need to split my batches.
However, when I try to execute a simple transaction, I receive a weird error.
The python code:
session = cypher.Session("http://127.0.0.1:7474/db/data/") #error also w/o /db/data/
def init():
tx = session.create_transaction()
for ngram, one_grams in data.items():
tx.append("CREATE "+str(n)+":WORD {'word': "+ngram+", 'rank': "+str(ngram_rank)+", 'prob': "+str(ngram_prob)+", 'gram': '0gram'}")
tx.execute() # line 69 in the error below
The error:
Traceback (most recent call last):
File "Ngram_neo4j.py", line 176, in <module>
init(rNgram_file="dataset_id.json")
File "Ngram_neo4j.py", line 43, in init
data = probability_items(data)
File "Ngram_neo4j.py", line 69, in probability_items
tx.execute()
File "D:\datasets\GOOGLE~1\virtenv\lib\site-packages\py2neo\cypher.py", line 224, in execute
return self._post(self._execute or self._begin)
File "D:\datasets\GOOGLE~1\virtenv\lib\site-packages\py2neo\cypher.py", line 209, in _post
raise TransactionError(error["code"], error["status"], error["message"])
KeyError: 'status'
I tried catching the exception:
except cypher.TransactionError as e:
print("--------------------------------------------------------------------------------------------")
print(e.status)
print(e.message)
But never gets called. (maybe an error on my part?)
Regular insert using graph_db.create({"node:" node}) do work, but are incredibly slow (36hrs for 2.5M nodes) Note that the dataset consists of a series of JSON files, each with a structure to 5 levels deep. I'd like to batch the last 2 levels (around 100 to 20.000 nodes per batch)
--- EDIT ---
I'm using Py2Neo 1.6.1, Neo4j 2.0.0. Currently on Windows 7 (but also OSX Mav., CentOS 6)
解決
The problem you're seeing is due to a last minute alteration in the way that Cypher transaction errors are reported by the Neo4j server. Py2neo 1.6 was built against M05/M06 and when a few features changed in RC1/GA, Py2neo broke in a few places.
This has been fixed for Py2neo 1.6.2 (https://github.com/nigelsmall/py2neo/issues/224) but I do not yet know when I will get a chance to finish and release this version.
他のヒント
What neo4j and py2neo versions are you using?
You should use parameters for your create statements.
Can you check the server logs in data/logs
and data/graph.db/messages.log
for errors?
If you have so much data to insert then perhaps direct batch-insertion would make more sense?
See: http://neo4j.org/develop/import
Two tools I wrote for this: