Question

NOTE

I let this become several questions instead of the simple one I asked, so I am breaking the follow-ups off into their own question here.

ORIGINAL QUESTION

I'm receiving a list of IDs that I am first testing whether any of them are in my graph, and if they /are/ I am processing those nodes further.

So, for example...

fids = get_fids(record)  # [100001, 100002, 100003, ... etc]
ids_in_my_graph = filter(id_is_in_graph, fids) # [100002]

def id_is_in_graph(id):
    val = False
    query = """MATCH (user:User {{id_str:"{}"}})
    RETURN user
    """.format(id)
    n=neo4j.CypherQuery(graph_db,query).execute_one()
    if n:
        val = True
    return(val)

As you can imagine, doing this with filter, sequentially testing whether each ID is in my graph is really, really slow, and is clearly not properly using neo4j.

How would I rephrase my query such that I could create a list like (User{id_str: [mylist]}) to query and return only IDs that are in my graph?

Was it helpful?

Solution

You may want to use WHERE...IN by exploiting the collection functionality of cypher. Here's the relevant reference

So your query might look like this:

MATCH (user:User) 
WHERE user.id_str IN ["100001", "100002", "100003"]
return user;

Now, I don't know how large a collection can be. I doubt this would work if your collection had 1,000 items in it. But at least this is a way of batching them up into chunks. This should improve performance.

Also have a look at the Collections section of the Cypher 2.0 refcard

OTHER TIPS

You should use cypher with parameters, like {id} and then pass "id"-> record.id to the execution

MATCH (user:User {id_str:{user_id}}),(friend:User {id_str:{friend_id}})
CREATE UNIQUE (user)-[:FRIENDS]->(friend)

{ "user_id" : record.id, "friend_id" : i}

Make sure to add a

create unique constraint on (u:User) assert u.id is unique;

And you can send multiple statements at once to the transactional http endpoint for cypher:

http://docs.neo4j.org/chunked/milestone/rest-api-transactional.html

Which is probably already supported by your driver.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top