Question

I'm trying to teach myself C++AMP, and would like to start with a very simple task from my field, that is image processing. I'd like to convert a 24 Bit-per-pixel RGB image (a Bitmap) to a 8 Bit-per-Pixel grayscale one. The image data is available in unsigned char arrays (obtained from Bitmap::LockBits(...) etc.)

I know that C++AMP for some reason cannot deal with char or unsigned char data via array or array_view, so I tried to use textures according to that blog. Here it is explained how 8bpp textures are written to, although VisualStudio 2013 tells me writeonly_texture_view was deprecated.

My code throws a runtime exception, saying "Failed to dispatch kernel." The complete text of the exception is lenghty:

ID3D11DeviceContext::Dispatch: The Unordered Access View (UAV) in slot 0 of the Compute Shader unit has the Format (R8_UINT). This format does not support being read from a shader as as UAV. This mismatch is invalid if the shader actually uses the view (e.g. it is not skipped due to shader code branching). It was unfortunately not possible to have all hardware implementations support reading this format as a UAV, despite that the format can written to as a UAV. If the shader only needs to perform reads but not writes to this resource, consider using a Shader Resource View instead of a UAV.

The code I use so far is this:

namespace gpu = concurrency;

gpu::extent<3> inputExtent(height, width, 3);
gpu::graphics::texture<unsigned int, 3> inputTexture(inputExtent, eight);
gpu::graphics::copy((void*)inputData24bpp, dataLength, inputTexture);
gpu::graphics::texture_view<unsigned int, 3> inputTexView(inputTexture);
gpu::graphics::texture<unsigned int, 2> outputTexture(width, height, eight);
gpu::graphics::writeonly_texture_view<unsigned int, 2> outputTexView(outputTexture);

gpu::parallel_for_each(outputTexture.extent,
    [inputTexView, outputTexView](gpu::index<2> pix) restrict(amp) {
    gpu::index<3> indR(pix[0], pix[1], 0);
    gpu::index<3> indG(pix[0], pix[1], 1);
    gpu::index<3> indB(pix[0], pix[1], 2);
    unsigned int sum = inputTexView[indR] + inputTexView[indG] + inputTexView[indB];
    outputTexView.set(pix, sum / 3);
});

gpu::graphics::copy(outputTexture, outputData8bpp);

What's the reason for this exception, and what can I do for a workaround?

Was it helpful?

Solution

I've also been learning C++Amp on my own and faced a very similar problem than yours, but in my case, I needed to deal with a 16 bit image.

Likely, the issue can be solved using textures although I can't help you on that due to a lack of experience.

So, what I did is basically based on bit masking.

First off, trick the compiler in order to let you compile:

unsigned int* sourceData = reinterpret_cast<unsigned int*>(source);
unsigned int* destData   = reinterpret_cast<unsigned int*>(dest);

Next, your array viewer has to see all your data. Be aware that viwer really thing your data is 32 bit sized. So, you have to make the conversion ( divided to 2 because 16 bits, use 4 for 8 bits).

concurrency::array_view<const unsigned int> source( (size+ 7)/2, sourceData) );
concurrency::array_view<unsigned int> dest( (size+ 7)/2, sourceData) );

Now, you are able to write a typical for_each block.

typedef concurrency::array_view<const unsigned int> OriginalImage;
typedef concurrency::array_view<unsigned int> ResultImage;

bool Filters::Filter_Invert()
{
    const int size = k_width*k_height;
    const int maxVal = GetMaxSize();

    OriginalImage& im_original = GetOriginal();
    ResultImage& im_result = GetResult();
    im_result.discard_data();

    parallel_for_each(
        concurrency::extent<2>(k_width, k_height), 
        [=](concurrency::index<2> idx) restrict(amp)
    {
        const int pos = GetPos(idx);
        const int val = read_int16(im_original, pos);

        write_int16(im_result, pos, maxVal - val);
    });

    return true;
}

int Filters::GetPos( const concurrency::index<2>& idx )  restrict(amp, cpu)
{
    return idx[0] * Filters::k_height + idx[1];
}

And here it comes the magic:

template <typename T>
unsigned int read_int16(T& arr, int idx) restrict(amp, cpu)
{
    return (arr[idx >> 1] & (0xFFFF << ((idx & 0x7) << 4))) >> ((idx & 0x7) << 4);
}

template<typename T>
void write_int16(T& arr, int idx, unsigned int val) restrict(amp, cpu)
{
    atomic_fetch_xor(&arr[idx >> 1], arr[idx >> 1] & (0xFFFF << ((idx & 0x7) << 4)));
    atomic_fetch_xor(&arr[idx >> 1], (val & 0xFFFF) << ((idx & 0x7) << 4));
}

Notice that this methods are for 16 bits for 8 bits won't work but it shouldn't be too difficult to adapt it to 8 bits. In fact, this was based on a 8 bit version, unfortunately, I couldn't find the reference.

Hope it helps.

David

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top