The problem begins right here:
file(filename).read()
This reads in the whole file into a string. Instead, if you process the file line-by-line or chunk-by-chunk, you won't run into a memory problem.
with open(filename) as f:
for line in f:
You could also benefit from using a collections.Counter to count the frequency of words.
In [1]: import collections
In [2]: freq = collections.Counter()
In [3]: line = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod'
In [4]: freq.update(line.split())
In [5]: freq
Out[5]: Counter({'ipsum': 1, 'amet,': 1, 'do': 1, 'sit': 1, 'eiusmod': 1, 'consectetur': 1, 'sed': 1, 'elit,': 1, 'dolor': 1, 'Lorem': 1, 'adipisicing': 1})
And to count some more words,
In [6]: freq.update(line.split())
In [7]: freq
Out[7]: Counter({'ipsum': 2, 'amet,': 2, 'do': 2, 'sit': 2, 'eiusmod': 2, 'consectetur': 2, 'sed': 2, 'elit,': 2, 'dolor': 2, 'Lorem': 2, 'adipisicing': 2})
A collections.Counter
is a subclass of dict
, so you can use it in ways with which you are already familiar. In addition, it has some useful methods for counting such as most_common.