Question

OK, so I have a problem that I really need help with.

My program reads values from a pdb file and stores those values in (array = []) I then take every combination of 4 from this arrangement of stored values and store this in a list called maxcoorlist. Because the list of combinations is such a large number, to speed things up I would like to simply take a sample of 1000-10000 from this list of combinations. However, in doing so I get a memory error on the very line that takes the random sample.

MemoryError                               Traceback (most recent call last)
<ipython-input-14-18438997b8c9> in <module>()
     77     maxcoorlist= itertools.combinations(array,4)
     78     random.seed(10)
---> 79     volumesample= random_sample(list(maxcoorlist), 1000)
     80     vol_list= [side(i) for i in volumesample]
     81     maxcoor=max(vol_list)

MemoryError: 

It is important that I use random.seed() in this code as well, as I will be taking other samples with the seed.

Was it helpful?

Solution

As mentioned in the other answers, the list() call is running you out of memory.

Instead, first iterate over maxcoorlist in order to find out its length. Then create random numbers in the range [0, length) and add them to an index set until the length of the index set is 1000.

Then iterate through maxcoorlist again and add the current value to a sample set if the current index is in your index set.

EDIT

An optimization is to directly calculate the length of maxcoorlist instead of iterating over it:

import math
n = len(array)
r = 4
length = math.factorial(n) / math.factorial(r) / math.factorial(n-r)

OTHER TIPS

maxcoorlist= itertools.combinations(array,4)
...
volumesample= random_sample(list(maxcoorlist), 1000)

When you execute volumesample you are building a list of all combinations from it... then sampling down to 1000...

Instead of a sample which requires the entire list be built, perhaps apply an islice to it instead, such as:

from itertools import islice
volumesample = list(islice(maxcoorlist, 1000))

Which will take the first 1000; you could tweak it to take every nth or similar to get a more sample-esque effect...

You're probably taking up a huge amount of memory (and time) with maxcoorlist and the cast to a list doubles the memory space it's taking up. You should probably generate the 1000 random combinations yourself: sample 4 elements at random, then check whether that combination is in your list (sort them, and use this_combination in combination_list. If combination_list is a set, then this check will be O(1))

This way you take only as much memory as you need.

how about refactor your code to use a tuple rather than a list like so:

maxcoorlist= itertools.combinations(array,4)
random.seed(10)
volumesample= random.sample(tuple(maxcoorlist), 1000)
vol_list= [side(i) for i in volumesample]
maxcoor=max(vol_list)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top