I heard that one of the main problems applying neural style to high resolution images is the huge amount of memory that would use.

Also I just configured a network using tiny-cnn

This is my calculation for number of weights and number of neurons per layer in my example:

conv_layer_# height width depth filter_height filter_width neurons_(h*w*d) weights
1            512    512   3     3             3            786432          108    
2            256    256   6     3             3            393216          216    
3            128    128   12    3             3            196608          432    
4            64     64    24    3             3            98304           864    
5            32     32    48    3             3            49152           1728   
6            16     16    96    3             3            24576           3456   
7            8      8     192   3             3            12288           6912   

                                                           1560576         13716  

If we give every value that has to be stored 8 bytes (single precision floating point format), we end up with 6297168 bytes, i.e. about 12 MB. But the net allocates over 1GB of RAM when I train it. What is all that memory needed for?

有帮助吗?

解决方案

I found out some of the factors that may contribute to the effect.

1) At least in tiny-cnn, some of the buffers are allocated not once but once per worker thread. On a machine with 8 CPU threads, this can increase the memory usage a lot. In debug mode using MS VC++ 2015 the following two lines in the code base allocate a big chunk, both related to worker threads: ith_in_node(i)->set_worker_size(worker_count); and ith_out_node(i)->set_worker_size(worker_count);.

2) Additionally to the values for the neurons and weights listed in my question, also gradients and some other stuff for the backward passes and the optimization have to be stored.

3) Not sure if this is relevant fot tiny-cnn, but many frameworks seem to use an operation called im2col. This makes the convolution much faster, by expressing it as a matrix multiplication. But in case of filters with 3*3 in height and width, this scales the number of values from the input to a convolution up by a factor of 9. Justin Johnson explains it in the lecture "S231n Winter 2016 Lecture 11 ConvNets in practice" starting at 36:22.

4) There was an error in my initial calculation. When a volume of 512*512*3 is convolved with 6 3*3 filters and then send into an average pooling layer, the result volume is 256*256*6, but in between it is 512*512*6, also contributing with a factor of 2.

5) There was another error in my initial calculation. I demonstrate it on the last conv layer (7). It takes a volume of 16*16*96 to a volume of 8*8*192 with filters of size 3*3. This means every filter has 3*3*96 weights, and there are 192 of them, resulting in 165888 (3*3*96*192) weights overall for this layer, not 6912.

So numerologically multiplying only the first three factors (8, 9 and 2) we end up with a factor of 144, which seems enough to explain the high memory consumption.

其他提示

Your calculation for the amount of memory used appears to be related to the number of neurons in the network and storing a double for each, but that isn't the only storage that is required -- each neuron will also contain a number of weights, each of which is likely to need at least a float. This is the last column in your output, and (at least if I understand the program you're using correctly) is the number of weights stored per neuron. Assuming each weight is a float, this adds up to a little over 2GB.

许可以下: CC-BY-SA归因
scroll top