Frage

I'm using a python library for deep learning and neural networks. The computer i'm running on has 16 gb of ram@1866 MHz. At first my input data file was too large, so I broke it smaller:

-rw-rw-r-- 1 jt jt 1.8G Mar 20 18:09 covarFile.pkl

caused:

Traceback (most recent call last): File "PYJT2/pp_dbn.py", line 69, in <module> go() File "PYJT2/pp_dbn.py", line 32, in go model = cPickle.load(open(CONTROL_DBN.INPUT, "rb")) MemoryError

Since the file was just an numpy array of numpy arrays, I could break it into seperate files, and would recreate the larger file dynamically in the program by loading numerous pickle files.

total 5.2G drwxrwxr-x 2 jt jt 4.0K Mar 20 18:15 ./ drwxrwxr-x 4 jt jt 4.0K Mar 20 18:15 ../ -rw-rw-r-- 1 jt jt 351M Mar 20 18:09 outfile-0.pkl -rw-rw-r-- 1 jt jt 351M Mar 20 18:11 outfile-10.pkl -rw-rw-r-- 1 jt jt 351M Mar 20 18:11 outfile-11.pkl -rw-rw-r-- 1 jt jt 351M Mar 20 18:12 outfile-12.pkl -rw-rw-r-- 1 jt jt 351M Mar 20 18:12 outfile-13.pkl -rw-rw-r-- 1 jt jt 351M Mar 20 18:12 outfile-14.pkl -rw-rw-r-- 1 jt jt 2.3M Mar 20 18:12 outfile-15.pkl -rw-rw-r-- 1 jt jt 351M Mar 20 18:09 outfile-1.pkl -rw-rw-r-- 1 jt jt 351M Mar 20 18:09 outfile-2.pkl -rw-rw-r-- 1 jt jt 351M Mar 20 18:10 outfile-3.pkl -rw-rw-r-- 1 jt jt 351M Mar 20 18:10 outfile-4.pkl -rw-rw-r-- 1 jt jt 351M Mar 20 18:10 outfile-5.pkl -rw-rw-r-- 1 jt jt 351M Mar 20 18:10 outfile-6.pkl -rw-rw-r-- 1 jt jt 351M Mar 20 18:11 outfile-7.pkl -rw-rw-r-- 1 jt jt 351M Mar 20 18:11 outfile-8.pkl -rw-rw-r-- 1 jt jt 351M Mar 20 18:11 outfile-9.pkl

And this solution worked fine. My problem is that now I have an enourmous file that is causing a MemoryError that I don't know how to break up further. It is a theano tensor variable representing a 30,000x30,000 matrix of floating point numbers. My questions:

  1. Is there a method to save something across multiple pkl files even if you are unsure of how to divide the underlying data structure?
  2. Will running this on our labs server (48 gb) work better? Or is this memory error independent of the architecture?
  3. Is the huge pkl file I have now that is too large to use worthless? I hope not, it was around 8 hours of neural network training.
  4. Are there any other solutions besides using a database that anyone can think of? If at all possible, I would strongly prefer to not use databases because I've already had to transfer the software to numerous servers, many of which I do not have root access to and is a pain to get other things installed.
War es hilfreich?

Lösung

First, pkl aren't great to save binary data and aren't memory friendly. It must copy all the data in ram before writing to disk. So this double the memory usage! You can use numpy.save and numpy.load to stores ndarray without that memory doubling.

For the Theano variable, I guess you are using a Theano shared variable. By default, when you get it via get_value(), it copy the data. You can use get_value(borrow=True) to don't copy this.

Both of those change together could lower the memory usage by 3x. If this isn't enough or if you are sick of handling multiple files yourself, I would suggest that you use pytables: http://www.pytables.org/ It allow to have one big ndarray stored in a file bigger then the ram avaiable, but it give an object similar to ndarray that you can manipulate very similarly to an ndarray.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top