Optimize this Neo4J Cypher query
Question
I'm learning Neo4J and my toy project is to play with Twitter. In this little script I'm using python tweepy and py2neo to take one twitter_user and insert all of their friends.
def insert_friends(twitter_user):
for friend in Cursor(api.friends, user_id=twitter_user.id_str).items():
n=neo4j.CypherQuery(graph_db,"""
MATCH (user),(friend)
WHERE user.id_str={user_id_str} AND friend.id_str={friend_id_str}
CREATE UNIQUE (user)-[:FOLLOWS]->(friend)
""").execute_one(user_id_str=twitter_user.id_str, friend_id_str=friend.id_str)
This works fine, but I suspect it can be optimized. Namely, in the WHERE clause, I'm looking up the same user.id each time. How do I avoid that extra lookup each time? For instance, is there anyway I could a priori figure out which node it is in Neo4J and just specify the Neo4J internal node id?
Solution
You need to use labels and indexes!
Namely:
CREATE INDEX on :User(id_str);
MATCH (user:User),(friend:User) // add labels so it knows to use the index
WHERE user.id_str={user_id_str} AND friend.id_str={friend_id_str}
CREATE UNIQUE (user)-[:FOLLOWS]->(friend);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow