Question

I have a hashable identifier for putting things in a dictionary:

class identifier():
    def __init__(self, d):
        self.my_dict = d
        self.my_frozenset = frozenset(d.items())
    def __getitem__(self, item):
        return self.my_dict[item]
    def __hash__(self):
        return hash(self.my_frozenset)
    def __eq__(self, rhs):
        return self.my_frozenset == rhs.my_frozenset
    def __ne__(self, rhs):
       return not self == rhs

I have a node type that encapsulates identifer for purposes of hashing and equality:

class node:
    def __init__(self, id, value):
        # id is of type identifier
        self.id = id
        self.value = value
        # define other data here...
    def __hash__(self):
        return hash(self.id)
    def __eq__(self, rhs):
        if isinstance(rhs, node):
            return self.id == rhs.id
        ### for the case when rhs is an identifier; this allows dictionary
        ### node lookup of a key without wrapping it in a node
        return self.id == rhs
    def __ne__(self, rhs):
        return not self == rhs

I put some nodes into a dictionary:

d = {}
n1 = node(identifier({'name':'Bob'}), value=1)
n2 = node(identifier({'name':'Alex'}), value=2)
n3 = node(identifier({'name':'Alex', 'nationality':'Japanese'}), value=3)
d[n1] = 'Node 1'
d[n2] = 'Node 2'
d[n3] = 'Node 3'

Some time later, I have only an identifier:

my_id = identifier({'name':'Alex'})

Is there any way to efficiently lookup the node that has been stored with this identifier in this dictionary?

Please note that this is a little trickier than it sounds; I know that I can trivially use d[my_id] to retrieve the associated item 'Node 2', but I want to efficiently return a reference to n2.

I know that I could do it by looking at every element in d, but I've tried that and it's much too slow (the dictionary has thousands of items in it and I do this a fair number of times).

I know that internally dict is using the hash and eq operators for that identifier to store node n2 and its associated item, 'Node 2'. In fact, using my_id to lookup 'Node 2' actually needs to lookup n2 as an intermediate step, so this should definitely be possible.

I am using this to store data in a graph. The nodes have a lot of additional data (where I put value) that is not used in the hash. I didn't create the graph package I'm using (networkX), but I can see the dictionary that stores my nodes. I could also keep an extra dictionary around of identifiers to nodes, but this would be a pain (I'd need to wrap the graph class and rewrite all add node, remove node, add nodes from list, remove nodes from list, add edge, etc. type functions to keep that dictionary up to date).

This is quite the puzzle. Any help would be really appreciated!

Was it helpful?

Solution

Instead of

d[n1] = 'Node 1'

use:

d[n1] = ('Node 1', n1)

Then you have access to n1 no matter how you found the value.

I don't believe there is a way with dictionaries to retrieve the original key k1 if all you have is a k2 equal to k1.

OTHER TIPS

Have two dictionaries. - Whenever you add a key/value to the primary dictionary, also add them to the reverse dictionary, but with the key/value swapped.

For example:

# When adding a value:
d[n2] = value;
# Must also add to the reverse dictionary:
rev[value] = d

# This means that:
value = d[n2]
# Will be able to efficiently find out the key used with:
key = rev[value]

Here is a way to use a custom node object with NetworkX. If you store the object in the "node attribute" dictionary you can use it as a reverse dictionary to get the object back by referencing the id. It's a little awkward but it works.

import networkx as nx

class Node(object):

    def __init__(self,id,**attr):
        self.id=id
        self.properties={}
        self.properties.update(attr)

    def __hash__(self):
        return self.id

    def __eq__(self,other):
        return self.id==other.id

    def __repr__(self):
        return str(self.id)

    def __str__(self):
        return str(self.id)


G=nx.Graph()
# add two nodes
n1=Node(1,color='red') # the node id must be hashable
n2=Node(2,color='green')
G.add_node(n1,obj=n1)
G.add_node(n2,obj=n2)

# check what we have
print G.nodes() # 1,2
print n1,n1.properties['color'] # 1,red
print n1==n2   # False 
for n in G:
    print n.properties['color']
print Node(1) in G # True
# change color of node 1
n1.properties['color']='blue'
for n in G:
    print n.properties

# use "node attribute" data in NetworkX to retrieve object
n=G.node[Node(1)]['obj']
print type(n) # <class '__main__.Node'>
print n # 1
print n.id # 1
print n.properties # {'color': 'blue'}

You can of course define a function that makes this simpler:

   def get_node(G,n):
        return G.node[Node(1)]['obj']

    n=get_node(G,1)
    print n.properties

The thing is, there is no guaranty that the key is effectively a Node. What if you do

d[my_id]=d[my_id] 

Everything would still work perfectly except now, your key is an Identifier and not a Node. Allowing two classes to "equal" like this is really dangerous. If you really need to find a Node by it's name that should be done in the Node class or externaly, but shouldn't depend of the presence of not of the node in a hash.

If you can't modify that (because you can't modify the code), then I guess you are stuck to do the ineffecient way

using my_id to lookup 'Node 2' actually needs to lookup n2 as an intermediate step

This is not true. A dictionary is a hashtable: it maps the hash of an item to (a bucket of) entries. When you ask for d[my_id], Python first gets hash(my_id) and then looks that up in d. You are getting confused because you have that hash(n1) == hash(id1), which is a Very Bad Thing.

You are asking for a mapping between identifiers and nodes. If you want one of these, you will have to create one yourself.


Are the identifiers all matched with nodes upon creation, or do you construct them later? That is, are you really asking to be able to find the node with identifier identifier({'name':'Alex'}), or has that identifier already been created and added to a node? If the latter, you could do the following:

class Node:
    def __init__(self, id, value):
        id.parent = self
        ...
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top