Indexing nodes in neo4j in python

https://stackoverflow.com/questions/9428871

12-11-2019
|

Question

I'm building a database with tag nodes and url nodes, and the url nodes are connected to tag nodes. In this case if the same url is inserted in to the database, it should be linking to the tag node, rather than creating duplicate url nodes. I think indexing would solve this problem. How is it possible to do indexing and traversal with the neo4jrestclient?. Link to a tutorial would be fine. I'm currently using versae neo4jrestclient.

Thanks

Solution

The neo4jrestclient supports both indexing and traversing the graph, but I think by using just indexing could be enoguh for your use case. However, I don't know if I understood properly your problem. Anyway, something like this could work:

>>> from neo4jrestclient.client import GraphDatabase

>>> gdb = GraphDatabase("http://localhost:7474/db/data/")

>>> idx =  gdb.nodes.indexes.create("urltags")

>>> url_node = gdb.nodes.create(url="http://foo.bar", type="URL")

>>> tag_node = gdb.nodes.create(tag="foobar", type="TAG")

We add the property count to the relationship to keep track the number of URLs "http://foo.bar" tagged with the tag foobar.

>>> url_node.relationships.create(tag_node["tag"], tag_node, count=1)

And after that, we index the url node according the value of the URL.

>>> idx["url"][url_node["url"]] = url_node

Then, when I need to create a new URL node tagged with a TAG node, we first query the index to check if that is yet indexed. Otherwise, we create the node and index it.

>>> new_url = "http://foo.bar2"

>>> nodes = idx["url"][new_url]

>>> if len(nodes):
...     rel = nodes[0].relationships.all(types=[tag_node["tag"]])[0]
...     rel["count"] += 1
... else:
...     new_url_node = gdb.nodes.create(url=new_url, type="URL")
...     new_url_node.relationships.create(tag_node["tag"], tag_node, count=1)
...     idx["url"][new_url_node["url"]] = new_url_node

OTHER TIPS

An important concept is that the indexes are key/value/object triplets where the object is either a node or a relationship you want to index.

Steps to create and use the index:

Create an instance of the graph database rest client.

from neo4jrestclient.client import GraphDatabase
gdb = GraphDatabase("http://localhost:7474/db/data/")

Create a node or relationship index (Creating a node index here)

index = gdb.nodes.indexes.create('latin_genre')

Add nodes to the index

nelly = gdb.nodes.create(name='Nelly Furtado')
shakira = gdb.nodes.create(name='Shakira')

index['latin_genre'][nelly.get('name')] = nelly
index['latin_genre'][shakira.get('name')] = shakira

Fetch nodes based on the index and do further processing:

for artist in index['latin_genre']['Shakira']:

  print artist.get('name')

More details can be found from the notes in the webadmin

Neo4j has two types of indexes, node and relationship indexes. With node indexes you index and find nodes, and with relationship indexes you do the same for relationships.

Each index has a provider, which is the underlying implementation handling that index. The default provider is lucene, but you can create your own index provides if you like.

Neo4j indexes take key/value/object triplets ("object" being a node or a relationship), it will index the key/value pair, and associate this with the object provided. After you have indexed a set of key/value/object triplets, you can query the index and get back objects that where indexed with key/value pairs matching your query.

For instance, if you have "User" nodes in your database, and want to rapidly find them by username or email, you could create a node index named "Users", and for each user index username and email. With the default lucene configuration, you can then search the "Users" index with a query like: "username:bob OR email:bob@gmail.com".

You can use the data browser to query your indexes this way, the syntax for the above query is "node:index:Users:username:bob OR email:bob@gmail.com".

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow