Numpy: Reduce memory footprint of dot product with random data

https://stackoverflow.com/questions/8720580

13-04-2021
|

Question

I have a large numpy array that I am going to take a linear projection of using randomly generated values.

>>> input_array.shape
(50, 200000)
>>> random_array = np.random.normal(size=(200000, 300))
>>> output_array = np.dot(input_array, random_array)

Unfortunately, random_array takes up a lot of memory, and my machine starts swapping. It seems to me that I don't actually need all of random_array around at once; in theory, I ought to be able to generate it lazily during the dot product calculation...but I can't figure out how.

How can I reduce the memory footprint of the calculation of output_array from input_array?

Solution

This obviously isn't the fastest solution, but have you tried:

m, inner = input_array.shape
n = 300
out = np.empty((m, n))
for i in xrange(n):
    out[:, i] = np.dot(input_array, np.random.normal(size=inner))

OTHER TIPS

This might be a situation where using cython could reduce your memory usage. You could generate the random numbers on the fly and accumulate the result as you go. I don't have the time to write and test the full function, but you would definitely want to use randomkit (the library that numpy uses under the hood) at the c-level.

You can take a look at some example code I wrote for another application to see how to wrap randomkit:

https://github.com/synapticarbors/pylangevin-integrator/blob/master/cIntegrator.pyx

And also check out how matrix multiplication is implemented in the following paper on cython:

http://conference.scipy.org/proceedings/SciPy2009/paper_2/full_text.pdf

Instead of having both arrays as inputs, just have input_array as one, and then in the method, generate small chunks of the random array as you go.

Sorry if it is just a sketch instead of actual code, but hopefully it is enough to get you started.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow