Question

Update 6/8/17

Though 3 years passed, my PR is still pending as a temporary solution by enforcing the output order. Stream-Framework might reconsider its design on using content as key for notifications. GitHub Issue #153 references this.

Question

See following sample:

import pickle
x = {'order_number': 'X', 'deal_url': 'J'}

pickle.dumps(x)
pickle.dumps(pickle.loads(pickle.dumps(x)))
pickle.dumps(pickle.loads(pickle.dumps(pickle.loads(pickle.dumps(x)))))

Results:

(dp0\nS'deal_url'\np1\nS'J'\np2\nsS'order_number'\np3\nS'X'\np4\ns.
(dp0\nS'order_number'\np1\nS'X'\np2\nsS'deal_url'\np3\nS'J'\np4\ns.
(dp0\nS'deal_url'\np1\nS'J'\np2\nsS'order_number'\np3\nS'X'\np4\ns.

Clearly, serialized output changes for every dump. When I remove a character from any of keys, this doesn't happen. I discovered this as Stream-Framework use pickled output as key for storage of notifications on its k/v store. I will pull request if we get a better understanding what is going on here. I have found two solutions to prevent it:

A - Convert to dictionary after sorting (yes, somehow provides the intended side effect)

import operator
sorted_x = dict(sorted(x.iteritems(), key=operator.itemgetter(1)))

B - Remove underscores (but not sure if this always works)

So what causes the mystery under dictionary sorting for pickle?

Proof that calling sort over dict provides dump to produce same result:

import operator
x = dict(sorted(x.iteritems(), key=operator.itemgetter(1)))

pickle.dumps(x)
"(dp0\nS'order_number'\np1\nS'X'\np2\nsS'deal_url'\np3\nS'J'\np4\ns."

x = pickle.loads(pickle.dumps(x))
x = dict(sorted(x.iteritems(), key=operator.itemgetter(1)))

pickle.dumps(x)
"(dp0\nS'order_number'\np1\nS'X'\np2\nsS'deal_url'\np3\nS'J'\np4\ns."
Was it helpful?

Solution

Dictionaries are unsorted data structures. This means that the order is arbitrary and pickle will store them as they are. You can use the collections.OrderedDict if you want to use a sorted dictionary.

Any order you think you see when you're playing around in the interpreter is just the interpreter playing nice with you.

From the documentation of dict:

It is best to think of a dictionary as an unordered set of key: value pairs, with the requirement that the keys are unique (within one dictionary)

Remember that the functions dict.keys(), dict.values() and dict.items() also return their respective values in arbitrary order.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top