Does exists some thrust::device_vector equivalent library, to use within CUDA kernel?

https://stackoverflow.com/questions/8053614

24-02-2021
|

Question

The automatic memory management of thrust::device_vector is really useful, the only drawback is that it's not possible to use it from within a kernel code.

I've looked on the Internet and just found vector libraries such as thrust, that deals with device memory from host code. Does any vector library for kernels exists? If not, is it a bad idea to have such a library?

Solution

It is possible to write such a library, but it would be very inefficient.

Indeed thrust::device_vector only differs from thrust::host_vector or std::vector in that it allocates memory on the device instead of the host. The resizing algorithm is the same, and runs on the host.

The resize logic is quite simple but involves allocating/freeing memory and copying the data. In a multi-threaded setting, you have to lock the whole vector each time a thread resizes it - which can be quite long because of the copy.

In the case of a kernel which appends elements to a vector, the synchronization mechanism would actually serialize the work since only one thread at a time is allowed to resize. Thus your code would run at the speed of a single device processor, minus the (quite big) synchronization overhead. This would probably be quite a lot slower than a CPU implementation.

OTHER TIPS

Thrust cannot be used within a kernel, however, a thrust::device_vector can be used up to the interface with the kernel. At that point, a pointer to the underlying data can be passed to the kernel. For example:

thrust::device_vector<int> my_int_vector;

my_kernel<<<blocks, threads>>>(thrust::raw_pointer_cast(my_int_vector.data());

Depending on your situation this may still mean the Thrust library is useful even when implementing your own kernels.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow