Question

I have some challenges with my Master's thesis I hope you can help me with or maybe point me in the correct direction.

I'm implementing Progressive Photon Mapping using the new approach by Knaus and Zwicker (http://www.cs.jhu.edu/~misha/ReadingSeminar/Papers/Knaus11.pdf) using OptiX. This approach makes each iteration/frame of PPM independent and more suitable for multi-GPU.

What i do (with a single GPU) is trace a number of photons using OptiX and then store them in a buffer. Then, the photons are then sorted into a spatial hash map using CUDA and thrust, never leaving the GPU. I want to do the spatial hash map creation on GPU since it is the bottleneck of my renderer. Finally, this buffer is used during indirect radiance estimation. So this is a several pass algorithm, consisting of ray-tracing, photon-tracing, photon map generation and finally create image.

I understand that OptiX can support multiple GPU. Each context launch is divided up across the GPUs. Any writes to buffers seems to be serialized and broadcasted to each device so that their buffer contents are the same.

What i would like to do is let one GPU do one frame, while second GPU does the next frame. I can then combine the results, for instance on the CPU or on one of the GPU's in a combine pass. It is also acceptable if i can do each pass in parallel on each device (synchronize between each pass). Is this somehow possible?

For instance, could I create two OptiX contexts mapping to each device on two different host threads. This would allow me to do the CUDA/thrust spatial hash map generation as before, assuming the photons are on one device, and merge the two generated images at the end of the pipeline. However, the programming guide states it does not support multi-thread context handling. I could use multiple processes but then there is a lot of mess with inter-process communication. This approach also requires duplicate work with creating the scene geometry, compiling PTX files and so on.

Thanks!

No correct solution

OTHER TIPS

OptiX already splits the workload accordingly to your GPUs power so your approach will likely not be faster than having OptiX dispose of all the GPUs.

If you want to force your data to remain on the device (notice that in such a situation writes from different devices will not be coherent) you can use the RT_BUFFER_GPU_LOCAL flag as indicated into the programming guide

https://developer.nvidia.com/optix-documentation

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top