Understanding python object membership for sets
-
02-10-2019 - |
Question
If I understand correctly, the __cmp__() function of an object is called in order to evaluate all objects in a collection while determining whether an object is a member, or 'in', the collection. However, this does not seem to be the case for sets:
class MyObject(object):
def __init__(self, data):
self.data = data
def __cmp__(self, other):
return self.data-other.data
a = MyObject(5)
b = MyObject(5)
print a in [b] //evaluates to True, as I'd expect
print a in set([b]) //evaluates to False
How is an object membership tested in a set, then?
Solution
>>> xs = []
>>> set([xs])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
There you are. Sets use hashes, very similar to dicts. This help performance extremely (membership tests are O(1), and many other operations depend on membership tests), and it also fits the semantics of sets well: Set items must be unique, and different items will produce different hashes, while same hashes indicate (well, in theory) duplicates.
Since the default __hash__
is just id
(which is rather stupid imho), two instances of a class that inherits object
's __hash__
will never hash to the same value (well, unless adress space is larger than the sizeof
the hash).
OTHER TIPS
Adding a __hash__
method to your class yields this:
class MyObject(object):
def __init__(self, data):
self.data = data
def __cmp__(self, other):
return self.data - other.data
def __hash__(self):
return hash(self.data)
a = MyObject(5)
b = MyObject(5)
print a in [b] # True
print a in set([b]) # Also True!
As others pointed, your objects don't have a __hash__
so they use the default id as a hash, and you can override it as Nathon suggested, BUT read the docs about __hash__
, specifically the points about when you should and should not do that.
A set uses a dict behind the scenes, so the "in" statement is checking whether the object exists as a key in the dict. Since your object doesn't implement a hash function, the default hash function for objects uses the object's id. So even though a and b are equivalent, they're not the same object, and that's what's being tested.