opencl branching vs memory redundancy

Question 1

It depends on the structure of your new grids, and also your old.

Let's take the worst case. Normal rectangular grid (like an image) If every odd item is of type 1 and every even is of type 2. Now basically half of your threads will sit idle in GPU (While the type1 is being counted the type2 threads 'idle'). It's because the items within a workgroup generally share their program counter.

If your new grids are 2 kernel calls and simple "not of type2? return" then it's worse than the first case. However if you manage to make 2 grids on which every item is of the correct type then it's far better to split it.

If your original grid is image with exact 2 halves it probably doesn't matter. Only groups within the boundary will perform extra work.

Branches are not that evil. Just think it so that whenever you have a branch and even a single thread within a workgroup (or whatever is the unit of scheduling in your HW) takes a different direction from others all of the code in both branches will be taken everywhere.

That is also the reason why optimizations such as not performing an expensive computation if some special condition applies do not work in general on GPU, because if the other threads don't fullfill the condition you will still effectively calculate it in every thread.

Question 2

There is no general rule for this, depends on the case. If you brach a lot of code is obviously that rearranging the memory is better. However if your branch is just 2 instructions, then do not reshape the memory.

I would first classify how many items you have of each type (CPU side or by a simple kernel), and the run a specific kernel for each type of item. However this may not be good for your case.

If you can post some code, maybe we can point you in the right direction.