First, pkl aren't great to save binary data and aren't memory friendly. It must copy all the data in ram before writing to disk. So this double the memory usage! You can use numpy.save and numpy.load to stores ndarray without that memory doubling.
For the Theano variable, I guess you are using a Theano shared variable. By default, when you get it via get_value()
, it copy the data. You can use get_value(borrow=True)
to don't copy this.
Both of those change together could lower the memory usage by 3x. If this isn't enough or if you are sick of handling multiple files yourself, I would suggest that you use pytables: http://www.pytables.org/ It allow to have one big ndarray stored in a file bigger then the ram avaiable, but it give an object similar to ndarray that you can manipulate very similarly to an ndarray.