Is it possible to train stylegan2 with a custom dataset using a graphics card that only has 6GB of VRAM (GeForce GTX 1660)?

https://datascience.stackexchange.com/questions/74666

11-12-2020
|

Pergunta

I'm attempting to train stylegan2 using a custom dataset, but no matter what settings I use I see the same error:

2020-05-22 11:15:05.261933: W tensorflow/core/common_runtime/bfc_allocator.cc:305] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.
2020-05-22 11:15:05.339186: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 3.52G (3781073152 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory

I'm assuming this means I need more GPU memory, but I've read that you can lower memory use in exchange for longer training periods. I did have to downgrade from tensorflow2 to 1.15 to use this project so there could be some underlying configuration issue, but I am able to generate images from the pretrained models without any issues.

This is how I'm running the training process:

python run_training.py --num-gpus=1 --data-dir=datasets --config=config-e --dataset=customdata --mirror-augment=true

I've tried using the other config-x options, and adjusting the settings in both run_training.py and training/training_loop.py although more specifically I'm just trying different values for sched.minibatch_size_base and sched.minibatch_gpu_base. Checking the results folder does tell me that the settings I've changed in run_training.py are actually used during the training process.

Here's the complete log from run_training.py if it's useful:

Local submit - run_dir: results\00021-stylegan2-customdata-1gpu-config-e
dnnlib: Running training.training_loop.training_loop() on localhost...
2020-05-22 13:02:45.261043: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-05-22 13:02:51.127997: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-05-22 13:02:51.169757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1660 major: 7 minor: 5 memoryClockRate(GHz): 1.785
pciBusID: 0000:01:00.0
2020-05-22 13:02:51.176966: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-05-22 13:02:51.187788: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-05-22 13:02:51.197589: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-05-22 13:02:51.205389: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2020-05-22 13:02:51.216122: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-05-22 13:02:51.225483: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-05-22 13:02:51.244887: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-22 13:02:51.253430: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-05-22 13:02:51.966561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-22 13:02:51.971731: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2020-05-22 13:02:51.974966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2020-05-22 13:02:51.979741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4630 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660, pci bus id: 0000:01:00.0, compute capability: 7.5)
Streaming data using training.dataset.TFRecordDataset...
self.tfrecord_dir: datasets\customdata
Dataset shape = [3, 64, 64]
Dynamic range = [0, 255]   
Label size    = 0
Constructing networks...
Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... 2020-05-22 13:03:25.588173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1660 major: 7 minor: 5 memoryClockRate(GHz): 1.785
pciBusID: 0000:01:00.0
2020-05-22 13:03:25.596152: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-05-22 13:03:25.600627: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-05-22 13:03:25.605487: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-05-22 13:03:25.610555: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll  
2020-05-22 13:03:25.618346: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-05-22 13:03:25.622514: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-05-22 13:03:25.626790: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll     
2020-05-22 13:03:25.632722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-05-22 13:03:25.638261: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-22 13:03:25.642363: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-05-22 13:03:25.645684: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-05-22 13:03:25.649560: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 4630 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660, pci bus id: 0000:01:00.0, compute capability: 7.5)
Loading... Done.
Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... 2020-05-22 13:03:50.302225: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1660 major: 7 minor: 5 memoryClockRate(GHz): 1.785
pciBusID: 0000:01:00.0
2020-05-22 13:03:50.310782: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-05-22 13:03:50.316161: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-05-22 13:03:50.395110: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-05-22 13:03:50.463435: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2020-05-22 13:03:50.468677: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-05-22 13:03:50.527377: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-05-22 13:03:50.531735: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-22 13:03:50.537159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-05-22 13:03:50.615931: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-22 13:03:50.679408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-05-22 13:03:50.682438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2020-05-22 13:03:50.686257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 4630 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660, pci bus id: 0000:01:00.0, compute capability: 7.5)
Loading... Done.

G                           Params    OutputShape       WeightShape     
---                         ---       ---               ---
latents_in                  -         (?, 512)          -
labels_in                   -         (?, 0)            -
lod                         -         ()                -
dlatent_avg                 -         (512,)            -
G_mapping/latents_in        -         (?, 512)          -
G_mapping/labels_in         -         (?, 0)            -
G_mapping/Normalize         -         (?, 512)          -
G_mapping/Dense0            262656    (?, 512)          (512, 512)
G_mapping/Dense1            262656    (?, 512)          (512, 512)
G_mapping/Dense2            262656    (?, 512)          (512, 512)
G_mapping/Dense3            262656    (?, 512)          (512, 512)
G_mapping/Dense4            262656    (?, 512)          (512, 512)
G_mapping/Dense5            262656    (?, 512)          (512, 512)
G_mapping/Dense6            262656    (?, 512)          (512, 512)
G_mapping/Dense7            262656    (?, 512)          (512, 512)
G_mapping/Broadcast         -         (?, 10, 512)      -
G_mapping/dlatents_out      -         (?, 10, 512)      -
Truncation/Lerp             -         (?, 10, 512)      -
G_synthesis/dlatents_in     -         (?, 10, 512)      -
G_synthesis/4x4/Const       8192      (?, 512, 4, 4)    (1, 512, 4, 4)
G_synthesis/4x4/Conv        2622465   (?, 512, 4, 4)    (3, 3, 512, 512)
G_synthesis/4x4/ToRGB       264195    (?, 3, 4, 4)      (1, 1, 512, 3)
G_synthesis/8x8/Conv0_up    2622465   (?, 512, 8, 8)    (3, 3, 512, 512)
G_synthesis/8x8/Conv1       2622465   (?, 512, 8, 8)    (3, 3, 512, 512)
G_synthesis/8x8/Upsample    -         (?, 3, 8, 8)      -
G_synthesis/8x8/ToRGB       264195    (?, 3, 8, 8)      (1, 1, 512, 3)
G_synthesis/16x16/Conv0_up  2622465   (?, 512, 16, 16)  (3, 3, 512, 512)
G_synthesis/16x16/Conv1     2622465   (?, 512, 16, 16)  (3, 3, 512, 512)
G_synthesis/16x16/Upsample  -         (?, 3, 16, 16)    -
G_synthesis/16x16/ToRGB     264195    (?, 3, 16, 16)    (1, 1, 512, 3)
G_synthesis/32x32/Conv0_up  2622465   (?, 512, 32, 32)  (3, 3, 512, 512)
G_synthesis/32x32/Conv1     2622465   (?, 512, 32, 32)  (3, 3, 512, 512)
G_synthesis/32x32/Upsample  -         (?, 3, 32, 32)    -
G_synthesis/32x32/ToRGB     264195    (?, 3, 32, 32)    (1, 1, 512, 3)
G_synthesis/64x64/Conv0_up  1442561   (?, 256, 64, 64)  (3, 3, 512, 256)
G_synthesis/64x64/Conv1     721409    (?, 256, 64, 64)  (3, 3, 256, 256)
G_synthesis/64x64/Upsample  -         (?, 3, 64, 64)    -
G_synthesis/64x64/ToRGB     132099    (?, 3, 64, 64)    (1, 1, 256, 3)
G_synthesis/images_out      -         (?, 3, 64, 64)    -
G_synthesis/noise0          -         (1, 1, 4, 4)      -
G_synthesis/noise1          -         (1, 1, 8, 8)      -
G_synthesis/noise2          -         (1, 1, 8, 8)      -
G_synthesis/noise3          -         (1, 1, 16, 16)    -
G_synthesis/noise4          -         (1, 1, 16, 16)    -
G_synthesis/noise5          -         (1, 1, 32, 32)    -
G_synthesis/noise6          -         (1, 1, 32, 32)    -
G_synthesis/noise7          -         (1, 1, 64, 64)    -
G_synthesis/noise8          -         (1, 1, 64, 64)    -
images_out                  -         (?, 3, 64, 64)    -
---                         ---       ---               ---
Total                       23819544


D                    Params    OutputShape       WeightShape
---                  ---       ---               ---
images_in            -         (?, 3, 64, 64)    -
labels_in            -         (?, 0)            -
64x64/FromRGB        1024      (?, 256, 64, 64)  (1, 1, 3, 256)
64x64/Conv0          590080    (?, 256, 64, 64)  (3, 3, 256, 256)
64x64/Conv1_down     1180160   (?, 512, 32, 32)  (3, 3, 256, 512)
64x64/Skip           131072    (?, 512, 32, 32)  (1, 1, 256, 512)
32x32/Conv0          2359808   (?, 512, 32, 32)  (3, 3, 512, 512)
32x32/Conv1_down     2359808   (?, 512, 16, 16)  (3, 3, 512, 512)
32x32/Skip           262144    (?, 512, 16, 16)  (1, 1, 512, 512)
16x16/Conv0          2359808   (?, 512, 16, 16)  (3, 3, 512, 512)
16x16/Conv1_down     2359808   (?, 512, 8, 8)    (3, 3, 512, 512)
16x16/Skip           262144    (?, 512, 8, 8)    (1, 1, 512, 512)
8x8/Conv0            2359808   (?, 512, 8, 8)    (3, 3, 512, 512)
8x8/Conv1_down       2359808   (?, 512, 4, 4)    (3, 3, 512, 512)
8x8/Skip             262144    (?, 512, 4, 4)    (1, 1, 512, 512)
4x4/MinibatchStddev  -         (?, 513, 4, 4)    -
4x4/Conv             2364416   (?, 512, 4, 4)    (3, 3, 513, 512)
4x4/Dense0           4194816   (?, 512)          (8192, 512)
Output               513       (?, 1)            (512, 1)
scores_out           -         (?, 1)            -
---                  ---       ---               ---
Total                23407361

2020-05-22 13:03:58.578847: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-05-22 13:03:58.961664: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-22 13:04:00.763442: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-05-22 13:04:01.548775: W tensorflow/core/common_runtime/bfc_allocator.cc:305] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.   
2020-05-22 13:04:01.651217: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
Building TensorFlow graph...

Here's the contents of submit_config.txt written to the results folder for the above job:

{   'datasets': [],
    'host_name': 'localhost',
    'local': <dnnlib.submission.internal.local.TargetOptions object at 0x0000027CF0D20D48>,
    'num_gpus': 1,
    'nvprof': False,
    'platform_extras': <dnnlib.submission.submit.PlatformExtras object at 0x0000027CF0D20E08>,
    'print_info': False,
    'run_desc': 'stylegan2-customdata-1gpu-config-e',
    'run_dir': 'results\\00021-stylegan2-customdata-1gpu-config-e',
    'run_dir_extra_files': [],
    'run_dir_ignore': ['__pycache__', '*.pyproj', '*.sln', '*.suo', '.cache', '.idea', '.vs', '.vscode', '_cudacache'],
    'run_dir_root': 'results',
    'run_func_kwargs': {   'D_args': {'fmap_base': 8192, 'func_name': 'training.networks_stylegan2.D_stylegan2'},
                           'D_loss_args': {'func_name': 'training.loss.D_logistic_r1', 'gamma': 100},
                           'D_opt_args': {'beta1': 0.0, 'beta2': 0.99, 'epsilon': 1e-08},
                           'G_args': {'fmap_base': 8192, 'func_name': 'training.networks_stylegan2.G_main'},
                           'G_loss_args': {'func_name': 'training.loss.G_logistic_ns_pathreg'},
                           'G_opt_args': {'beta1': 0.0, 'beta2': 0.99, 'epsilon': 1e-08},
                           'data_dir': 'datasets',
                           'dataset_args': {'tfrecord_dir': 'customdata'},
                           'grid_args': {'layout': 'random', 'size': '8k'},
                           'image_snapshot_ticks': 10,
                           'metric_arg_list': [{'func_name': 'metrics.frechet_inception_distance.FID', 'minibatch_per_gpu': 8, 'name': 'fid50k', 'num_images': 50000}],
                           'mirror_augment': True,
                           'network_snapshot_ticks': 10,
                           'sched_args': {'D_lrate_base': 0.002, 'G_lrate_base': 0.002, 'minibatch_gpu_base': 1, 'minibatch_size_base': 8},
                           'tf_config': {'rnd.np_random_seed': 1000},
                           'total_kimg': 25000},
    'run_func_name': 'training.training_loop.training_loop',
    'run_id': 21,
    'run_name': '00021-stylegan2-customdata-1gpu-config-e',
    'submit_target': <SubmitTarget.LOCAL: 1>,
    'task_name': 'itsame-00021-stylegan2-customdata-1gpu-config-e',
    'user_name': 'itsame'}

I've trained other models with the same hardware, but I'm guessing stylegan2 requires a bit more space to work. Thanks for reading!

EDIT:

I've added some code to tfutil.py and now I have a different error! According to the web, I may need to downgrade CUDA.

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333, allow_growth=True)
    graph_options = tf.GraphOptions(place_pruned_graph=True)
    config_proto = tf.ConfigProto(gpu_options=gpu_options, graph_options=graph_options)

error is now:

tensorflow.python.framework.errors_impl.InternalError: cudaErrorInvalidConfiguration
         [[node GPU0/G_loss/PathReg/G/G_synthesis/8x8/Upsample/UpFirDn2D (defined at C:\Python37\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]

EDIT 5/23/2020:

The above error seemed to go away on its own after reducing the batch size and using a much lower gpu memory fraction. I'm seeing this error now:

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[3,3,512,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node TrainG/Apply0/grad_acc_var_38/Assign (defined at C:\Python37\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

I'm going to try and reduce the tensor size to 256x256. I have no idea how to do that or what it means, but most of what I've read about this error seems to suggest that.

Solução

According to the github readme:

One or more high-end NVIDIA GPUs, NVIDIA drivers, CUDA 10.0 toolkit and cuDNN 7.5. To reproduce the results reported in the paper, you need an NVIDIA GPU with at least 16 GB of DRAM.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange