Question

I'm trying to write an anagram service. The first stage of the program is to go through a dictionary of words and create a Python dictionary with keys for the word lengths and values of the words of those lengths, ie:

def processedDictionary():
    d = defaultdict(list)
    f = open(dictionaryFile, "r")
    f.close()
    for line in lines:
        length = len(line)
        d[length].append(line)
    return d

This means that the anagram word only has to be compared to words of the same length, with processedDictionary()[length] which speeds up the script. However, I was trying to optimise the script even more, because it is silly that the dictionary has to be 'processed' every time somebody anagrams a word, so I looked at pickle for loading the already sorted dictionary every time:

def processedDictionary():
    file = open("dic.obj",'rb')
    object_file = pickle.load(file)
    file.close()
    return object_file

dic.obj is a 2MB dump of the processed dictionary. However, even with cPickle the pickled dictionary loads about twice as slow as the original script! Can anybody suggest what I am missing here and what the correct route to optimise the dictionary loading is?

Was it helpful?

Solution

When you dump the data, make sure you specify the protocol to use:

with open('dict.obj', 'wb') as fh:
    pickle.dump(obj, fh, pickle.HIGHEST_PROTOCOL)

And when loading, you should see a speed increese if you switch to Python 3 (if possible).

with open('dict.obj', 'rb') as fh:
    return pickle.load(fh)

Also storing the pickled file on a separate medium would be reccommended. Because running everything from the same device will slow down the reading process.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top