Question

I'm using DirectCompute to do general computing on the GPU. Currently, I'm trying to operate on a texture with resolution 1920x1080. I have a Dispatch(2, 1080, 1) and numthreads(960, 1, 1) which according to my calculations exactly covers my image with one thread per pixel.

Now, as I understand it, all threads should run at the same time, right? However, in my code, I do not do any computation if the pixel is black. So I've noticed a definite increase in performance when most of my image is black. However, if one object blocks up the screen, the performance drops drastically.

My question is: if all the threads are running in parallel, the speed of processing a frame would be determined by the worst performing thread, essentially the threads running on the black pixels will be idling, right? So why then am I seeing a slow-down when more pixels are processed? They should be doing so at the same time. Or have I got this all wrong?

Any help would be appreciated.

Was it helpful?

Solution

Not all threads execute concurrently. The exact numbers have probably changed a bit, but a few years ago, a high-end CPU was able to keep 16k threads in flight at a time, but "only" a few hundred of them actually executed concurrently. (This is then further subdivided into smaller subgroups, and every thread in such a subgroup runs in exact lockstep, instruction by instruction, branch by branch) The rest were suspended, waiting for I/O or otherwise blocked.

So if you have an algorithm requiring two million executions, then only a fraction of them are going to even exist as threads at any time, and of those, only a fraction are actually executing in a single batch. And among the threads that are currently executing, some are forced to run in exact lockstep (so there's no such thing as one of the threads exiting early, the entire group has to follow the same path), but different groups can terminate at different times.

Yes, threading on the GPU is complicated.

OTHER TIPS

If you have a very heavy algorithm, and are using your image for backbuffer rendering it could create a stall. Forcing the backbuffer to wait for the image. try render it the next frame. so you are "frame-behind".

And how dose your algorithm look like?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top