Question
Using the shelve module has given me some surprising behavior. keys(), iter(), and iteritems() don't return all the entries in the shelf! Here's the code:
cache = shelve.open('my.cache')
# ...
cache[url] = (datetime.datetime.today(), value)
later:
cache = shelve.open('my.cache')
urls = ['accounts_with_transactions.xml', 'targets.xml', 'profile.xml']
try:
print list(cache.keys()) # doesn't return all the keys!
print [url for url in urls if cache.has_key(url)]
print list(cache.keys())
finally:
cache.close()
and here's the output:
['targets.xml']
['accounts_with_transactions.xml', 'targets.xml']
['targets.xml', 'accounts_with_transactions.xml']
Has anyone run into this before, and is there a workaround without knowing all possible cache keys a priori?
Solution
According to the python library reference:
...The database is also (unfortunately) subject to the limitations of dbm, if it is used — this means that (the pickled representation of) the objects stored in the database should be fairly small...
This correctly reproduces the 'bug':
import shelve
a = 'trxns.xml'
b = 'foobar.xml'
c = 'profile.xml'
urls = [a, b, c]
cache = shelve.open('my.cache', 'c')
try:
cache[a] = a*1000
cache[b] = b*10000
finally:
cache.close()
cache = shelve.open('my.cache', 'c')
try:
print cache.keys()
print [url for url in urls if cache.has_key(url)]
print cache.keys()
finally:
cache.close()
with the output:
[]
['trxns.xml', 'foobar.xml']
['foobar.xml', 'trxns.xml']
The answer, therefore, is don't store anything big—like raw xml—but rather results of calculations in a shelf.
OTHER TIPS
Seeing your examples, my first thought is that cache.has_key()
has side effects, i.e. this call will add keys to the cache. What do you get for
print cache.has_key('xxx')
print list(cache.keys())