質問

I'm processing items in a grid, depending on the item's type a different type of computation/function needs to be performed. But I've read branching is a very bad thing to do between workitems doing the same thing. To circumvent this I could split the grid into a grid per type (I would only need two in this particular case)...

What would be better in this case; Leave the branching in there, or making two grids one for each type? I understand this depends on what happens inside the branch (computational bound) vs how big the grids will be (memory/latency bound).

Are there some ground rules to follow for these kinds of decisions or is there consensus which one is better in general?

Edit: The (spatial) grid is not sparse as is usual with spatial grids, but a dense array (no empty elements) of structs (~200 bytes per struct) which will hold up to about 500.000 elements.

I fill this array from another source, using that source I put either triangles or linesegments in there.

Then using this grid, i'll need to do either linesegment/linesegment or linesegment/triangle collision detection. So the question is whether it will be more efficient to fill two seperate arrays (for sake of argument lets say 250.000 elements x 200 bytes) in this case and have workitems do batch computations for only line/line or line/triangle.. or have one big one of 500.000x200 bytes and have each workitem figure out what computation to perform given a type.

役に立ちましたか?

解決

It depends on the structure of your new grids, and also your old.

Let's take the worst case. Normal rectangular grid (like an image) If every odd item is of type 1 and every even is of type 2. Now basically half of your threads will sit idle in GPU (While the type1 is being counted the type2 threads 'idle'). It's because the items within a workgroup generally share their program counter.

If your new grids are 2 kernel calls and simple "not of type2? return" then it's worse than the first case. However if you manage to make 2 grids on which every item is of the correct type then it's far better to split it.

If your original grid is image with exact 2 halves it probably doesn't matter. Only groups within the boundary will perform extra work.

Branches are not that evil. Just think it so that whenever you have a branch and even a single thread within a workgroup (or whatever is the unit of scheduling in your HW) takes a different direction from others all of the code in both branches will be taken everywhere.

That is also the reason why optimizations such as not performing an expensive computation if some special condition applies do not work in general on GPU, because if the other threads don't fullfill the condition you will still effectively calculate it in every thread.

他のヒント

There is no general rule for this, depends on the case. If you brach a lot of code is obviously that rearranging the memory is better. However if your branch is just 2 instructions, then do not reshape the memory.

I would first classify how many items you have of each type (CPU side or by a simple kernel), and the run a specific kernel for each type of item. However this may not be good for your case.

If you can post some code, maybe we can point you in the right direction.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top