Question

I am creating dictionary out of a large file.

def make_dic():
    big_dic={}
    for foo in open(bar):
           key,value=do_something(foo)
           big_dic[key]=value
def main():
    make_dic() #this takes time

I have to access this dictionary many times but from completely different programs. It takes lot of time to read this file and make dictionary. Is it possible to make a dictionary which remains in memory even if one program exits???? So that I create it once but can use it again and again from different programs....

Était-ce utile?

La solution

This won't work for all situations that fit your description, but cPickle should help with speed.

The only problem I can think of is that combining data persistence with IPC is tough. So if these different programs are modifying the dictionary at the same time, pickle won't help. Another approach might be to use a database...

I like Sven Marnach's suggestion, but there are some tradeoffs worth considering. Some setup...

>>> pickle_file = open('pickle_foo', 'w')
>>> anydbm_file = anydbm.open('anydbm_foo', 'c')
>>> d = dict((str(i), str(j)) for i, j in zip(range(999999, -1, -1), range(0, 1000000)))

Obviously populating the anydbm_file will be pretty slow:

>>> %timeit for k, v in d.iteritems(): anydbm_file[k] = v
1 loops, best of 3: 5.14 s per loop

The time is comparable to the time it takes to dump and load a pickle file:

>>> %timeit cPickle.dump(d, pickle_file)
1 loops, best of 3: 3.79 s per loop
>>> pickle_file.close()
>>> pickle_file = open('pickle_foo', 'r')
>>> %timeit d = cPickle.load(pickle_file)
1 loops, best of 3: 2.03 s per loop

But the anydbm_file you only have to create once; then, opening it again is nigh-instantaneous.

>>> %timeit anydbm_file = anydbm.open('anydbm_foo', 'r')
10000 loops, best of 3: 74.3 us per loop

So anydbm has the advantage there. On the other hand,

>>> %timeit for i in range(1, 1000): x = anydbm_file[str(i)]
100 loops, best of 3: 3.15 ms per loop
>>> %timeit for i in range(1, 1000): x = d[str(i)]
1000 loops, best of 3: 374 us per loop

Reading a key from anydbm_file takes ten times longer than reading a key from a dictionary in memory. You'd have to do a lot of lookups for this difference to outweigh the 5 seconds necessary for a pickle dump/load cycle; but even if you don't, the difference in read times here could lead to sluggish performance, depending on what you're doing.

Other options are SQLite3 or (for a separate database server process that allows connections from multiple processes running concurrently), psycopg2 + PostgreSQL.

Autres conseils

The easiest way to make a dictionary with keys and values being strings persistent is Python's anydbm module. You can basically create a file that acts like a dictionary mapping strings to strings.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top