If you allow kernel_func
to finish (e.g. with cudaDeviceSynchronize()
), then I doubt that my_array
is still "occupying memory" as you suggest, after the kernel completes, i.e. at the point of this comment:
// my_array is still occupying memory are this point
You could be more certain with a call to cudaMemGetInfo() at that point.
Nevertheless, it's likely what you're experiencing is memory fragmentation of some sort.
The only way I know of to "clean the slate" would be a call to cudaDeviceReset()
at that point. However that will kill any operations as well as any allocations on the GPU, so you should only do it when you have no other activity going on with the GPU, and you must re-allocate any GPU data that you need after the call to cudaDeviceReset()
.
Certainly if you can arrange your allocations using cudaMalloc
instead, that might be easier.
Note that cudaDeviceReset()
by itself is insufficient to restore a GPU to proper functional behavior. In order to accomplish that, the "owning" process must also terminate. See here.