Вопрос

I'm using python3.3. It's possible this problem doesn't exist in 2.x's pickle protocol, but I haven't actually verified.

Suppose I've created a dict subclass that counts every time a key is updated. Something like this:

class Foo(dict):
    def __init__(self):
        self.counter = 0

    def __setitem__(self, key, value):
        print(key, value, self.__dict__)
        if key == 'bar':
            self.counter += 1
        super(Foo, self).__setitem__(key, value)

You might use it like this:

>>> f = Foo()
>>> assert f.counter == 0
>>> f['bar'] = 'baz'
... logging output...        
>>> assert f.counter == 1

Now let's pickle and unpickle it:

>>> import pickle
>>> f_str = pickle.dumps(f)
>>> f_new = pickle.loads(f_str)
bar baz {}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "test.py", line 133, in __setitem__
    self.counter += 1
AttributeError: 'Foo' object has no attribute 'counter'

I think the print() in __setitem__ shows the problem: pickle.loads attempts to write the dictionary's keys before it writes the object's attributes... at least I think that's what's happening. It's pretty easy to verify if you remove the self.counter reference in Foo.__setitem__():

>>> f_mod = ModifiedFoo()
>>> f_mod['bar'] = 'baz'
>>> f_mod_str = pickle.dumps(f_mod)
>>> f_mod_new = pickle.loads(f_mod_str)
bar baz {}
>>> assert f_mod_new.counter == 0
>>>

Is this just a byproduct of the pickle protocol? I've tried variations on __setstate__ to let it unpickle correctly, but as far as I can tell, it hits the __setitem__ error before __setstate__ is even called. Is there any way I can modify this object to allow unpickling?

Это было полезно?

Решение

As stated by pickle documentation:

When a pickled class instance is unpickled, its __init__() method is normally not invoked.

In your case you do want to invoke __init__. However since your class is a new-style class you cannot use __getinitargs__ (which isn't supported in python3 anyway). You could try to write your custom __getstate__ and __setstate__ methods:

class Foo(dict):
    def __init__(self):
        self.counter = 0
    def __getstate__(self):
        return (self.counter, dict(self))
    def __setstate__(self, state):
        self.counter, data = state
        self.update(data)  # will *not* call __setitem__

    def __setitem__(self, key, value):
        self.counter += 1
        super(Foo, self).__setitem__(key, value)

However this still doesn't work, because since you are subclassing dict and dict has a special handler for pickling, the __getstate__ method is called, however the __setstate__ method is not.

You can work around this defining the __reduce__ method:

class Foo(dict):
    def __init__(self):
        self.counter = 0
    def __getstate__(self):
        return (self.counter, dict(self))
    def __setstate__(self, state):
        self.counter, data = state
        self.update(data)
    def __reduce__(self):
        return (Foo, (), self.__getstate__())

    def __setitem__(self, key, value):
        self.counter += 1
        super(Foo, self).__setitem__(key, value)

Другие советы

You are subclassing dict, and the pickle protocol will use the dedicated dict handler to store the keys and values in the resulting pickle data, using a different set of opcodes to restore these to your object again.

As a result, __setstate__ is going only going to be called after restoring the dictionary keys, and the state contains only the counter attribute.

There are two work-arounds here:

  1. Make your counter code resilient in the face of __init__ not being called:

    class Foo(dict):
        counter = 0
    
        def __setitem__(self, key, value):
            print(key, value, self.__dict__)
            if key == 'bar':
                self.counter += 1
            super(Foo, self).__setitem__(key, value)
    

    Here counter is a class attribute and thus always present. You could also use:

    self.counter = getattr(self, 'counter', 0) + 1
    

    to ensure there is a default value for the missing attribute.

  2. Provide a __newargs__ method; it can return an empty tuple, but specifying it ensures that __new__ is called when unpickling, which in turn could call __init__:

    class Foo(dict):
        def __new__(cls, *args, **kw):
            f = super().__new__(cls, *args, **kw)
            f.__init__()
            return f
    
        def __init__(self):
            self.counter = 0
    
        def __setitem__(self, key, value):
            print(key, value, self.__dict__)
            if key == 'bar':
                self.counter += 1
            super(Foo, self).__setitem__(key, value)
    
        def __getnewargs__(self):
            # Call __new__ (and thus __init__) on unpickling.
            return ()
    

    Note that after __init__ is called, the unpickler still will set all the keys, then restore __dict__. self.counter will reflect the correct value in the end.

Demos:

1st approach:

>>> import pickle
>>> class Foo(dict):
...     counter = 0
...     def __setitem__(self, key, value):
...         print(key, value, self.__dict__)
...         if key == 'bar':
...             self.counter += 1
...         super(Foo, self).__setitem__(key, value)
... 
>>> f = Foo()
>>> f['bar'] = 'baz'
bar baz {}
>>> f.counter
1
>>> f['bar'] = 'foo'
bar foo {'counter': 1}
>>> f.counter
2
>>> f_str = pickle.dumps(f)
>>> new_f = pickle.loads(f_str)
bar foo {}
>>> new_f.counter
2
>>> new_f.items()
dict_items([('bar', 'foo')])

2nd approach:

>>> import pickle
>>> class Foo(dict):
...     def __new__(cls, *args, **kw):
...         f = super().__new__(cls, *args, **kw)
...         f.__init__()
...         return f
...     def __init__(self):
...         self.counter = 0
...     def __setitem__(self, key, value):
...         print(key, value, self.__dict__)
...         if key == 'bar':
...             self.counter += 1
...         super(Foo, self).__setitem__(key, value)
...     def __getnewargs__(self):
...         return ()
... 

>>> f = Foo()
>>> f['bar'] = 'baz'
bar baz {'counter': 0}
>>> f.counter
1
>>> f['bar'] = 'foo'
bar foo {'counter': 1}
>>> f.counter
2
>>> f_str = pickle.dumps(f)
>>> new_f = pickle.loads(f_str)
bar foo {}
>>> new_f.counter
2
>>> new_f.items()
dict_items([('bar', 'foo')])

You can add pickle support to your dictionary subclass by adding a __reduce__() method which will be used to get arguments to pass to a user defined function to reconstitute the object when it's unpickled.

Although, since your class is adictsubclass, not wasn't quite as trivial to implement as I originally thought, but it's fairly simple once I figured out what needed to be done. Here's what I came up with — note that the _Foo_unpickle_helper() function can't be a regular or static method of the class, so that's why it's defined at the module level:

class Foo(dict):
    def __init__(self):
        self.counter = 0

    def __setitem__(self, key, value):
        print(key, value, self.__dict__)
        if key == 'bar':
            self.counter += 1
        super(Foo, self).__setitem__(key, value)

    def __reduce__(self):
        return _Foo_unpickle_helper, (self.counter, iter(self.items()))

def _Foo_unpickle_helper(counter, items):
    """ Reconstitute a Foo instance from the arguments. """
    foo = Foo()
    foo.counter = counter
    foo.update(items)  # apparently doesn't call __setitem__()...
    return foo

f = Foo()
f['bar'] = 'baz'
f['bar'] = 'baz'
print('f: {}'.format(f))
print('f.counter: {}'.format(f.counter))

import pickle
f_str = pickle.dumps(f)
print('----------')
f_new = pickle.loads(f_str)
print('f_new: {}'.format(f_new))
print('f_new.counter: {}'.format(f_new.counter))

Output:

bar baz {'counter': 0}
bar baz {'counter': 1}
f: {'bar': 'baz'}
f.counter: 2
----------
f_new: {'bar': 'baz'}
f_new.counter: 2
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top