Question

Using the shelve module has given me some surprising behavior. keys(), iter(), and iteritems() don't return all the entries in the shelf! Here's the code:

cache = shelve.open('my.cache')
# ...
cache[url] = (datetime.datetime.today(), value)

later:

cache = shelve.open('my.cache')
urls = ['accounts_with_transactions.xml', 'targets.xml', 'profile.xml']
try:
    print list(cache.keys()) # doesn't return all the keys!
    print [url for url in urls if cache.has_key(url)]
    print list(cache.keys())
finally:
    cache.close()

and here's the output:

['targets.xml']
['accounts_with_transactions.xml', 'targets.xml']
['targets.xml', 'accounts_with_transactions.xml']

Has anyone run into this before, and is there a workaround without knowing all possible cache keys a priori?

Was it helpful?

Solution

According to the python library reference:

...The database is also (unfortunately) subject to the limitations of dbm, if it is used — this means that (the pickled representation of) the objects stored in the database should be fairly small...

This correctly reproduces the 'bug':

import shelve

a = 'trxns.xml'
b = 'foobar.xml'
c = 'profile.xml'

urls = [a, b, c]
cache = shelve.open('my.cache', 'c')

try:
    cache[a] = a*1000
    cache[b] = b*10000
finally:
    cache.close()


cache = shelve.open('my.cache', 'c')

try:
    print cache.keys()
    print [url for url in urls if cache.has_key(url)]
    print cache.keys()
finally:
    cache.close()

with the output:

[]
['trxns.xml', 'foobar.xml']
['foobar.xml', 'trxns.xml']

The answer, therefore, is don't store anything big—like raw xml—but rather results of calculations in a shelf.

OTHER TIPS

Seeing your examples, my first thought is that cache.has_key() has side effects, i.e. this call will add keys to the cache. What do you get for

print cache.has_key('xxx')
print list(cache.keys())
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top