Fastest way to transfer vertex data to GPU in OpenGL / CUDA

https://stackoverflow.com/questions/18912615

29-06-2022
|

Question

I have to upload just specific elements (more thousands) of the vertex array on every frame - or the whole region between the first and last changed value, however it is pretty inefficient, due to it has the probability of re-upload the whole array, anyway many unchanged values will be uploaded.

The question also includes that what are the fastest ways to upload vertex data to the GPU.

There are several ways to do it:

glBufferData() / glBufferSubData()  // Standard upload to buffer
glBufferData()                      // glBufferData with double buffer
glMapBuffer()                       // Mapping video memory
cudaMemcpy()                        // CUDA memcopy from host to device vertex buffer

Which will be the fastest one? I'm especially concerned about the CUDA way and that's difference to standard OpenGL methods. Is it faster than glBufferData() or glMapBuffer()?

Solution

The speed of copying the same data from host to device should be similar no matter which copy API you use.

However the size of the data block to be copied matters a lot. Here is a benchmark showing the relationship between the data size and the copy speed using CUDA's cudaMemcpy().

CUDA - how much slower is transferring over PCI-E?

enter image description here

You could simply estimate the average speed from the above figure if you know the number of copy API you will invoke and the data size of each copy.

When the element size is small and the number of elements is large, copying only changed elements individually from host to device by invoking the copy API thousands of times is definitely not a good idea.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow