I've also been learning C++Amp on my own and faced a very similar problem than yours, but in my case, I needed to deal with a 16 bit image.
Likely, the issue can be solved using textures although I can't help you on that due to a lack of experience.
So, what I did is basically based on bit masking.
First off, trick the compiler in order to let you compile:
unsigned int* sourceData = reinterpret_cast<unsigned int*>(source);
unsigned int* destData = reinterpret_cast<unsigned int*>(dest);
Next, your array viewer has to see all your data. Be aware that viwer really thing your data is 32 bit sized. So, you have to make the conversion ( divided to 2 because 16 bits, use 4 for 8 bits).
concurrency::array_view<const unsigned int> source( (size+ 7)/2, sourceData) );
concurrency::array_view<unsigned int> dest( (size+ 7)/2, sourceData) );
Now, you are able to write a typical for_each block.
typedef concurrency::array_view<const unsigned int> OriginalImage;
typedef concurrency::array_view<unsigned int> ResultImage;
bool Filters::Filter_Invert()
{
const int size = k_width*k_height;
const int maxVal = GetMaxSize();
OriginalImage& im_original = GetOriginal();
ResultImage& im_result = GetResult();
im_result.discard_data();
parallel_for_each(
concurrency::extent<2>(k_width, k_height),
[=](concurrency::index<2> idx) restrict(amp)
{
const int pos = GetPos(idx);
const int val = read_int16(im_original, pos);
write_int16(im_result, pos, maxVal - val);
});
return true;
}
int Filters::GetPos( const concurrency::index<2>& idx ) restrict(amp, cpu)
{
return idx[0] * Filters::k_height + idx[1];
}
And here it comes the magic:
template <typename T>
unsigned int read_int16(T& arr, int idx) restrict(amp, cpu)
{
return (arr[idx >> 1] & (0xFFFF << ((idx & 0x7) << 4))) >> ((idx & 0x7) << 4);
}
template<typename T>
void write_int16(T& arr, int idx, unsigned int val) restrict(amp, cpu)
{
atomic_fetch_xor(&arr[idx >> 1], arr[idx >> 1] & (0xFFFF << ((idx & 0x7) << 4)));
atomic_fetch_xor(&arr[idx >> 1], (val & 0xFFFF) << ((idx & 0x7) << 4));
}
Notice that this methods are for 16 bits for 8 bits won't work but it shouldn't be too difficult to adapt it to 8 bits. In fact, this was based on a 8 bit version, unfortunately, I couldn't find the reference.
Hope it helps.
David