Auto vectorization Region of interest (crop)

https://stackoverflow.com/questions/12477337

02-07-2021
|

Domanda

I have a library which has some image processing algorithms including a Region of Interest (crop) algorithm. When compiling with GCC, the auto vectorizer speeds up a lot of the code but worsens the performance of the Crop algorithm. Is there a way of flagging a certain loop to be ignored by the vectorizer or is there a better way of structuring the code for better performance?

for (RowIndex=0;RowIndex<Destination.GetRows();++RowIndex)
{
    rowOffsetS = ((OriginY + RowIndex) * SizeX) + OriginX;
    rowOffsetD = (RowIndex * Destination.GetColumns());
    for (ColumnIndex=0;ColumnIndex<Destination.GetColumns();++ColumnIndex)
    {
        BufferSPtr=BufferS + rowOffsetS + ColumnIndex;
        BufferDPtr=BufferD + rowOffsetD + ColumnIndex;
        *BufferDPtr=*BufferSPtr;
    }
}

Where SizeX is the width of the source OriginX is the left of the region of interest OriginY is the top of the region of interest

Soluzione

I haven't found anything about changing the optimization flags for a loop, however according to the documentation you can use the attribute optimize (look here and here) on a function to override the optimization settings for that function somewhat like this:

void foo() __attribute__((optimize("O2", "inline-functions")))

If you want to change it for several functions, you can use #pragma GCC optimize to set it for all following functions (look here).

So you should be able to compile the function containing crop with a different set of optimization flags, omitting the auto-vectorization. That has the disadvantage of hardcoding the compilation flags for that function, but is the best I found.

With regards to restructuring for better performance the two points I already mentioned in the comments come to mind (assuming the ranges can't overlap):

declaring the pointers as __restrict to tell the compiler that they don't alias (the area pointed to by one pointer won't be accessed by any other means inside the function). The possibility of pointer aliasing is a major stumbling block for the optimizer, since it can't easily reorder the accesses if it doesn't know if writing to BufferD will change the contents of BufferS.
Replacing the inner loop with a call to copy:
```
std::copy(BufferS + rowOffsetS, BufferS + rowOffsetS + Destination.GetColumns(), BufferD + rowOffsetD);
```
The copy function is likely to be pretty well optimized (probably forwarding the arguments to memmove), so that might make your code faster, while also making your code shorter (always a plus).

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow