Auto vectorization Region of interest (crop)
-
02-07-2021 - |
Domanda
I have a library which has some image processing algorithms including a Region of Interest (crop) algorithm. When compiling with GCC, the auto vectorizer speeds up a lot of the code but worsens the performance of the Crop algorithm. Is there a way of flagging a certain loop to be ignored by the vectorizer or is there a better way of structuring the code for better performance?
for (RowIndex=0;RowIndex<Destination.GetRows();++RowIndex)
{
rowOffsetS = ((OriginY + RowIndex) * SizeX) + OriginX;
rowOffsetD = (RowIndex * Destination.GetColumns());
for (ColumnIndex=0;ColumnIndex<Destination.GetColumns();++ColumnIndex)
{
BufferSPtr=BufferS + rowOffsetS + ColumnIndex;
BufferDPtr=BufferD + rowOffsetD + ColumnIndex;
*BufferDPtr=*BufferSPtr;
}
}
Where
SizeX
is the width of the source
OriginX
is the left of the region of interest
OriginY
is the top of the region of interest
Soluzione
I haven't found anything about changing the optimization flags for a loop, however according to the documentation you can use the attribute optimize
(look here and here) on a function to override the optimization settings for that function somewhat like this:
void foo() __attribute__((optimize("O2", "inline-functions")))
If you want to change it for several functions, you can use #pragma GCC optimize
to set it for all following functions (look here).
So you should be able to compile the function containing crop with a different set of optimization flags, omitting the auto-vectorization. That has the disadvantage of hardcoding the compilation flags for that function, but is the best I found.
With regards to restructuring for better performance the two points I already mentioned in the comments come to mind (assuming the ranges can't overlap):
declaring the pointers as
__restrict
to tell the compiler that they don't alias (the area pointed to by one pointer won't be accessed by any other means inside the function). The possibility of pointer aliasing is a major stumbling block for the optimizer, since it can't easily reorder the accesses if it doesn't know if writing toBufferD
will change the contents ofBufferS
.Replacing the inner loop with a call to copy:
std::copy(BufferS + rowOffsetS, BufferS + rowOffsetS + Destination.GetColumns(), BufferD + rowOffsetD);
The
copy
function is likely to be pretty well optimized (probably forwarding the arguments tomemmove
), so that might make your code faster, while also making your code shorter (always a plus).