Question

I have a class (see example bellow) which acts as a .NET wrapper for a CUDA memory structure,
allocated using cudaMalloc() and referenced using a member field of type IntPtr.
(The class uses DllImport of a native C DLL which wraps various CUDA functionality.)

The dispose methods checks if the pointer is IntPtr.Zero and if not calls cudaFree()
which successfully deallocates the memory (returns CUDA success)
and sets the pointer to IntPtr.Zero.

The finalize method calls the dispose method.

The problem is, that if the finalize methods is called with out the dispose being called previously,
then the cudaFree() function sets an error code of "invalid device pointer".

I checked and the address the cudaFree() receives is the same address returned by the cudaMalloc() and no dispose() has been called previously.

When I add a explict call to dispose() the same address is successfully freed.

The only workaround I found was to not call the dispose method from the finalizer, however, this could cause a memory leak, if dispose() is not always called.

Any ideas why this happens? - I encountered the same issue with CUDA 2.2 and 2.3, under .NET 3.5 SP1 on Windows Vista 64bit + GeForce 8800 and on Windows XP 32bit + Quadro FX (not sure which number).

class CudaEntity : IDisposable
{
    private IntPtr dataPointer;

    public CudaEntity()
    {
        // Calls cudaMalloc() via DllImport,
        // receives error code and throws expection if not 0
        // assigns value to this.dataPointer
    }

    public Dispose()
    {
        if (this.dataPointer != IntPtr.Zero)
        {
            // Calls cudaFree() via DllImport,
            // receives error code and throws expection if not 0

            this.dataPointer = IntPtr.Zero;
        }
    }

    ~CudaEntity()
    {
        Dispose();
    }
}
{
    // this code works
    var myEntity = new CudaEntity();
    myEntity.Dispose();
}
{
    // This code cause a "invalid device pointer"
    // error on finalizer's call to cudaFree()
    var myEntity = new CudaEntity();
}
Was it helpful?

Solution

The problem is that finalizers are executed on the GC thread, CUDA resource allocated in one thread can't be used in another one. A snip from CUDA programming guide:

Several host threads can execute device code on the same device, but by design, a host thread can execute device code on only one device. As a consequence, multiple host threads are required to execute device code on multiple devices. Also, any CUDA resources created through the runtime in one host thread cannot be used by the runtime from another host thread.

Your best bet is to use the using statement, which ensures that the Dispose() method gets always called at the end of the 'protected' code block:

using(CudaEntity ent = new CudaEntity())
{

}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top