This is due to the CUDA driver being loaded and a CUDA context (an environment where all your data and programs are held in the device) being created every time, which requires lots of bookkeeping. You can force the driver to be loaded at all times by doing the following as root:
nvidia-smi -pm 1
Which will enable the so-called "persistence mode" (set it to 0 to disable). This will speed up your initialisation.