Speeding up py2neo using Cypher

https://stackoverflow.com/questions/19014822

29-06-2022
|

Pregunta

I am populating a graph from an SQLite3 database into neo4j, using py2neo with Python 3.2 on Ubuntu linux. Although speed is not of the utmost concern, the graph has only gotten 40K rows (one relation for each sql-row) in about 3hours, out of a total of 5 million rows.

Here is the main loop:

from py2neo import neo4j as neo
import sqlite3 as sql

#select all 5M rows from sql-database
sql_str =  """select * from bigram_with_number"""  

#loop through each row
for (freq, first, firstfreq, second, secondfreq) in sql_cursor.execute(sql_str):

    # create the Cypher query string using cypher 2.0 with merge
    # so that nodes are created only if needed

    query = neo.CypherQuery(neo4j_db,"""
        CYPHER 2.0 
            merge (n:word {form: {firstvar}, freq: {freqfirst}})
            merge(m:word {form: {secondvar}, freq: {freqsecond}}) 
            create unique (n)-[:bigram {freq: {freqbigram}}]->(m) return n, m""")
    #execute the string with parameters from sql-query
    result = query.execute(freqbigram = freq, firstvar = first, freqfirst=firstfreq, secondvar=second, freqsecond=secondfreq)

Although the database populates nicely, it will take weeks before it is finished. I suspect it is possible to do this faster.

Solución

For bulk loading, you're probably better off bypassing the REST interface and using something lower level such as Michael Hunger's load tools: https://github.com/jexp/neo4j-shell-tools. Even at optimal performance, the REST interface is unlikely to ever achieve the speeds you're looking for.

As an aside, please note that I don't officially support Python 3.2 although I do support 3.3.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow