Why do we bypass instance attributes during implicit lookup of special methods?
-
29-06-2021 - |
Question
From the ‘Special method lookup for new-style classes’ section of the ‘Data model’ chapter in the Python documentation (bold emphasis mine):
For new-style classes, implicit invocations of special methods are only guaranteed to work correctly if defined on an object’s type, not in the object’s instance dictionary. That behaviour is the reason why the following code raises an exception (unlike the equivalent example with old-style classes):
>>> class C(object): ... pass ... >>> c = C() >>> c.__len__ = lambda: 5 >>> len(c) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: object of type 'C' has no len()
The rationale behind this behaviour lies with a number of special methods such as
__hash__()
and__repr__()
that are implemented by all objects, including type objects. If the implicit lookup of these methods used the conventional lookup process, they would fail when invoked on the type object itself:>>> 1 .__hash__() == hash(1) True >>> int.__hash__() == hash(int) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: descriptor ’__hash__’ of ’int’ object needs an argument
Incorrectly attempting to invoke an unbound method of a class in this way is sometimes referred to as ‘metaclass confusion’, and is avoided by bypassing the instance when looking up special methods:
>>> type(1).__hash__(1) == hash(1) True >>> type(int).__hash__(int) == hash(int) True
I cannot catch the words in bold well…
Solution
To understand what's going on here, you need to have a (basic) understanding of the conventional attribute lookup process. Take a typical introductory object-oriented programming example - fido
is a Dog
:
class Dog(object):
pass
fido = Dog()
If we say fido.walk()
, the first thing Python does is to look for a function called walk
in fido
(as an entry in fido.__dict__
) and call it with no arguments - so, one that's been defined something like this:
def walk():
print "Yay! Walking! My favourite thing!"
fido.walk = walk
and fido.walk()
will work. If we hadn't done that, it would look for an attribute walk
in type(fido)
(which is Dog
) and call it with the instance as the first argument (ie, self
) - that is triggered by the usual way we define methods in Python:
class Dog:
def walk(self):
print "Yay! Walking! My favourite thing!"
Now, when you call repr(fido)
, it ends up calling the special method __repr__
. It might be (poorly, but illustratively) defined like this:
class Dog:
def __repr__(self):
return 'Dog()'
But, the bold text is saying that it also makes sense to do this:
repr(Dog)
Under the lookup process I just described, the first thing it looks for is a method called __repr__
assigned to Dog
... and hey, look, there is one, because we just poorly but illustratively defined it. So, Python calls:
Dog.__repr__()
And it blows up in our face:
>>> Dog.__repr__()
Traceback (most recent call last):
File "<pyshell#38>", line 1, in <module>
Dog.__repr__()
TypeError: __repr__() takes exactly 1 argument (0 given)
because __repr__()
expects a Dog
instance to be passed to it as its self
argument. We could do this to make it work:
class Dog:
def __repr__(self=None):
if self is None:
# return repr of Dog
# return repr of self
But, then, we would need to do this every time we write a custom __repr__
function. That it needs to know how to find the __repr__
of the class is a problem, but not much of a one - it can just delegate to Dog
's own class (type(Dog)
) and call its __repr__
with Dog
as its self
-argument:
if self is None:
return type(Dog).__repr__(Dog)
But first, this breaks if the classname changes in the future, since we've needed to mention it twice in the same line. But the bigger problem is that this is basically going to be boilerplate: 99% of implementations will just delegate up the chain, or forget to and hence be buggy. So, Python takes the approach described in those paragraphs - repr(foo)
skips finding an __repr__
attached to foo
, and goes straight to:
type(foo).__repr__(foo)
OTHER TIPS
What you have to remember is that classes are instances of their metaclass. Some operations need to be performed not just on instances, but on types as well. If the method on the instance was run then it would fail since the method on the instance (really a class in this case) would require an instance of the class rather than the metaclass.
class MC(type):
def foo(self):
print 'foo'
class C(object):
__metaclass__ = MC
def bar(self):
print 'bar'
C.foo()
C().bar()
C.bar()
Normal attribute retrieval obj.attr
looks up attr
in the instance attributes and class attributes of obj
. It is defined in object.__getattribute__
and type.__getattribute__
.
Implicit special method call special(obj, *args, **kwargs)
(e.g. hash(1)
) looks up __special__
(e.g. __hash__
) in the class attributes of obj
(e.g. 1
), bypassing the instance attributes of obj
instead of performing the normal attribute retrieval obj.__special__
, and calls it. The rationale is that the instance attributes of obj
may require a receiver argument (usually called self
) which is an instance of obj
to be called (e.g. function attributes) whereas special(obj, *args, **kwargs)
does not provide one, contrary to the class attributes of obj
which may require a receiver argument (usually called self
) which is an instance of the class type(obj)
to be called (e.g. function attributes) and special(obj, *args, **kwargs)
provides one: obj
.
Example
The special method __hash__
takes a single argument. Compare these two expressions:
>>> 1 .__hash__
<method-wrapper '__hash__' of int object at 0x103c1f930>
>>> int.__hash__
<slot wrapper '__hash__' of 'int' objects>
- The first expression retrieves the method
vars(type(1))['__hash__'].__get__(1)
bound to1
from the class attributevars(type(1))['__hash__']
. So the class attribute requires a receiver argument which is an instance oftype(1)
to be called, and we have already provided one:1
. - The second expression retrieves the function
vars(int)['__hash__'].__get__(None, int)
from the instance attributevars(int)['__hash__']
. So the instance attribute requires a receiver argument which is an instance ofint
to be called, and we have not provided one yet.
>>> 1 .__hash__()
1
>>> int.__hash__(1)
1
Since the built-in function hash
takes a single argument, hash(1)
can provide the 1
required in the first call (a class attribute call) while hash(int)
cannot provide the 1
required in the second call (an instance attribute call). Consequently, hash(obj)
should bypass the instance attribute vars(obj)['__hash__']
and directly access the class attribute vars(type(obj))['__hash__']
:
>>> hash(1) == vars(type(1))['__hash__'].__get__(1)()
True
>>> hash(int) == vars(type(int))['__hash__'].__get__(int)()
True