Pergunta

Now CUDA allows dynamic allocation on the global memory. However, I couldn't find any reference to the scalability of that malloc function: is it any better than, for instance, preallocate a chunk of memory and then just assign the next memory chuck to a thread by atomically incrementing a global integer? This last "home-made" solution works but there is an obvious problem with scalability, so I wonder whether malloc takes care of that somehow.

Foi útil?

Solução

I think while your "home-made" solution might be just as good currently, although concurrent calls to a global integer might slow it down, Malloc would be my choice.

This is because it allows for Nvidia to deal with the headache of scalability and to make improvements, in either the hardware or software implementation, that you can taken advantage of just by re-compiling your code at a later date.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top