Question

I want to write a function for converting BGRA in BGR. void convertBGRAViewtoBGRView( const boost::gil::bgra8_view_t &src, boost::gil::bgr8_view_t dst ) If I write it like this:

size_t numPixels = src.width() * src.height();
boost::gil::bgra8_view_t::iterator it = src.begin();
boost::gil::bgr8_view_t::iterator itD = dst.begin();
for( int i = 0; i < numPixels; ++i ){

    boost::gil::bgra8_pixel_t pixe(it[0]);
    *it++;
    boost::gil::bgr8_pixel_t pix(pixe[0],pixe[1],pixe[2]);

    *itD++ = pix;        
}

it works, but it is very slow. So I want to use NEON instructions and therefore I need a pointer for example (UInt8*) or (UInt32*). I tried it like this:

UInt32 *temp = (UInt32*)&src(0,0);
for( int i = 0; i < numPixels; ++i ){        
    boost::gil::bgr8_pixel_t pixk( (( *temp) & 0xff), ( (*temp>>8) & 0xff), ((*temp >> 16 )& 0xff));
    *itD++ = pixk;
    temp += 1;
}

This works more or less, but the resulting image isn't correct. I think maybe a problem with alignment. Does anyone have an idea how get it to work? This solution is about 3 times faster than the solution with the iterator.

UPDATE: I checked with the debugger: the src has width 480x360 and till i == 259 everything is correct, but afterwords the solution with iterator and pointer is different.

Thanks.

Was it helpful?

Solution

After some computation based on your answer, I found out that 360*4 is dividable by anything up to 32, whereas 360*4+8*4 is even dividable by 64. So I guess the reason for the error is that GIL in your case tries to align image rows at 64 byte boundaries and therefore doesn't store them contiguously.

Because of this it is always advised to use the generic iterator interface instead of messing with the raw memory directly, otherwise you have to be completely sure about any such alignment conventions (but maybe they are perfectly standardized and can be read somewhere in the documentation).

OTHER TIPS

OK I found how to fix it, but still don't know the reason :) This works for images with width 360 in my case.

UInt32 *temp = (UInt32*)&src(0,0);
for( int i = 0; i < numPixels; ++i ){   
  if( i%360==0 && i!=0 ){
    temp += 8;
  }
  boost::gil::bgr8_pixel_t pixk( (( *temp) & 0xff), ( (*temp>>8) & 0xff), ((*temp >> 16 )& 0xff));
  *itD++ = pixk;
  temp += 1;
}

It is even better to use this one for the iOS platform:

UInt8 *temp = (UInt8*)&src(0,0);
for( int i = 0; i < numPixels; ++i ){   
  if( i%360==0 && i!=0 ){
    temp += 8*4;
  }
  boost::gil::bgr8_pixel_t pixk( *temp, *(temp+1), *(temp+2));
  *itD++ = pixk;
  temp += 4;
}

Getting rid of the other iterator further improves speed (tested on iOS).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top