C++AMP exception in simple image processing example

Question

I've also been learning C++Amp on my own and faced a very similar problem than yours, but in my case, I needed to deal with a 16 bit image.

Likely, the issue can be solved using textures although I can't help you on that due to a lack of experience.

So, what I did is basically based on bit masking.

First off, trick the compiler in order to let you compile:

unsigned int* sourceData = reinterpret_cast<unsigned int*>(source);
unsigned int* destData   = reinterpret_cast<unsigned int*>(dest);

Next, your array viewer has to see all your data. Be aware that viwer really thing your data is 32 bit sized. So, you have to make the conversion ( divided to 2 because 16 bits, use 4 for 8 bits).

concurrency::array_view<const unsigned int> source( (size+ 7)/2, sourceData) );
concurrency::array_view<unsigned int> dest( (size+ 7)/2, sourceData) );

Now, you are able to write a typical for_each block.

typedef concurrency::array_view<const unsigned int> OriginalImage;
typedef concurrency::array_view<unsigned int> ResultImage;

bool Filters::Filter_Invert()
{
    const int size = k_width*k_height;
    const int maxVal = GetMaxSize();

    OriginalImage& im_original = GetOriginal();
    ResultImage& im_result = GetResult();
    im_result.discard_data();

    parallel_for_each(
        concurrency::extent<2>(k_width, k_height), 
        [=](concurrency::index<2> idx) restrict(amp)
    {
        const int pos = GetPos(idx);
        const int val = read_int16(im_original, pos);

        write_int16(im_result, pos, maxVal - val);
    });

    return true;
}

int Filters::GetPos( const concurrency::index<2>& idx )  restrict(amp, cpu)
{
    return idx[0] * Filters::k_height + idx[1];
}

And here it comes the magic:

template <typename T>
unsigned int read_int16(T& arr, int idx) restrict(amp, cpu)
{
    return (arr[idx >> 1] & (0xFFFF << ((idx & 0x7) << 4))) >> ((idx & 0x7) << 4);
}

template<typename T>
void write_int16(T& arr, int idx, unsigned int val) restrict(amp, cpu)
{
    atomic_fetch_xor(&arr[idx >> 1], arr[idx >> 1] & (0xFFFF << ((idx & 0x7) << 4)));
    atomic_fetch_xor(&arr[idx >> 1], (val & 0xFFFF) << ((idx & 0x7) << 4));
}

Notice that this methods are for 16 bits for 8 bits won't work but it shouldn't be too difficult to adapt it to 8 bits. In fact, this was based on a 8 bit version, unfortunately, I couldn't find the reference.

Hope it helps.

David