Interpolation with CUDA Texture memory

https://stackoverflow.com/questions/12069448

27-06-2021
|

Question

I would like to use the Texture Memory for Interpolation of Data. I have 2 Arrays (namely A[i] and B[i]) and I would want to interpolate Data between them. I thought I could bind them to Texture Memory and set the interpolation but I am not sure how I can do that.

The examples that come with CUDA use the A[i-1] and A[i+1] for the interpolation.

Is there any way to do what I planned? I'm trying this because I think I can get a good speedup.

Solution

Yes, you can do this with texture memory, and it is fast. I personally use ArrayFire to accomplish these kinds of operations, because it is faster than I can hope to code by hand.

If you want to code by hand yourself in CUDA, something like this is what you want:

// outside kernel

texture<float,1>  A;
cudaChannelFormatDesc desc = cudaCreateChannelDesc<float>();
cudaArray *arr = NULL;
cudaError_t e = cudaMallocArray(&arr, &desc, 1, length);
A.filterMode = cudaFilterModePoint;
A.addressMode[0] = cudaAddressModeClamp;
cudaBindTextureToArray(A, arr, desc);

...

// inside kernel
    
valA = tex1D(A,1,idx)
valB = tex1D(B,1,idx)

float f = 0.5;
output  = (f)*valA + (1-f)*valB;

If you want to just plug-in ArrayFire (which in my experience is faster than what I try to code by hand, not to mention way simpler to use), then you'll want:

// in arrayfire
array A = randu(10,1);
array B = randu(10,1);
float f = 0.5;
array C = (f)*A + (1-f)*B;

The above assumes you want to interpolate between corresponding indices of 2 different arrays or matrices. There are other interpolation functions available too.

OTHER TIPS

If you're not used to developing with CUDA, using texture memory is not the easiest thing to start with.

I'd suggest you to try writing a first parallel version of your algorithm in CUDA with no optimisation. Then, use the NVIDIA Visual Profiler on your application to figure out whether you need to set up texture memory to optimize your memory accesses.

Remember that the earlier you optimize, the trickier it is to debug.

Last but not least, the latest CUDA version (CUDA 5, still in release candidate) is able to automatically store your data in texture memory as long as you declare the input buffers passed as parameters to your kernel as const restrict pointers.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow