OpenCL assembly optimization for “testing carry flag after adding”

https://stackoverflow.com/questions/12231622

29-06-2021
|

Question

In my OpenCL kernel, I find this:

error += y;
++y;
error += y;
// The following test may be implemented in assembly language in
// most machines by testing the carry flag after adding 'y' to
// the value of 'error' in the previous step, since 'error'
// nominally has a negative value.
if (error >= 0)
{
    error -= x;
    --x;
    error -= x;
}

Obviously, those operations can easily be optimized using some nifty assembly instructions. How can I optimized this code in OpenCL?

Solution

You don't. The OpenCL compiler decides what to do with the code, depending on the target hardware and the optimization settings, which can be set as pragmas or as parameters when building the kernel. If it is smart enough, it'll use the nifty assembly instructions for the platform on which the kernel is to be run. If not, well, it won't.

You have to keep in mind that OpenCL is a general framework applicable to many devices, not just your standard consumer-grade processor, so going "under the hood" is not really possible due to differences in assembly instructions (i.e. OpenCL is meant to be portable, if you start writing x86 opcodes in your kernel, how is it going to run on a graphics card for instance?)

If you need absolute maximum performance on a specific device, you shouldn't be using OpenCL, IMHO.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow