Question

I am trying to write a convolution filter which uses a specific function to determine the exact output color of the pixel. Does anybody know if it is possible to define a function which can be used in parallel_for_each block?

Was it helpful?

Solution

A function must follow a number of rules to successfully compile with restrict(amp). The first, as mentioned in the parallel_for_each()section, involves functions that it calls. Those must be visible at code generation time and must also be marked with restrict(amp). If you are not using link time code generation, this essentially means they must be in the same .cpp file by compile time, possibly from a header file included in that .cpp file. If you are using /ltcg when compiling both .cpp files (the one that calls the function and the one that implements it) as well as when linking, then you can keep the calling and called functions in separate files.

A C++ AMP-compatible function or lambda can only use C++ AMP-compatible types, which include the following:

  • int
  • unsigned int
  • float
  • double
  • C-style arrays of int, unsigned int, float, or double
  • concurrency::array_view or references to concurrency::array
  • structs containing only C++ AMP-compatible types

This means that some data types are forbidden:

  • bool(can be used for local variables in the lambda)
  • char
  • short
  • long long
  • unsigned versions of the above

References and pointers (to a compatible type) may be used locally but cannot be captured by a lambda. Function pointers, pointer-to-pointer, and the like are not allowed; neither are static or global variables.

Classes must meet more rules if you wish to use instances of them. They must have no virtual functions or virtual inheritance. Constructors, destructors, and other nonvirtual functions are allowed. The member variables must all be of compatible types, which could of course include instances of other classes as long as those classes meet the same rules.

The actual code in your amp-compatible function is not running on a CPU and therefore can’t do certain things that you might be used to doing:

  • recursion
  • pointer casting
  • use of virtual functions
  • new or delete
  • RTTI or dynamic casting

Here's an example which does exactly what you are trying to do I think but does not use tiling. The shift parameter is the size (radius) of the square pixel mask. In this example I don't try and calculate new values for the elements so close to the edge of the array. In order to not waste threads on these elements where there is no work the parallel_for_each takes an extent that is shift * 2 elements smaller than the array. The corrected index, idc, adjusts the idx value based on the extent to refer to the correct element.

void MatrixSingleGpuExample(const int rows, const int cols, const int shift)
{
    //  Initialize matrices

    std::vector<float> vA(rows * cols);
    std::vector<float> vC(rows * cols);
    std::iota(vA.begin(), vA.end(), 0.0f);

    //  Calculation

    accelerator_view view = accelerator(accelerator::default_accelerator).default_view;
    double time = TimeFunc(view, [&]()
    {
        array_view<const float, 2> a(rows, cols, vA); 
        array_view<float, 2> c(rows, cols, vC);
        c.discard_data();

        extent<2> ext(rows - shift * 2, cols - shift * 2);
        parallel_for_each(view, ext, [=](index<2> idx) restrict(amp)
        {
            index<2> idc(idx[0] + shift, idx[1] + shift);
            c[idc] = WeightedAverage(idc, a, shift);
        });
        c.synchronize();
    });
}

float WeightedAverage(index<2> idx, const array_view<const float, 2>& data, int shift) 
    restrict(amp)
{
    if (idx[1] < shift || idx[1] >= data.extent[1] - shift)
        return 0.0f;
    float max = fast_math::sqrtf((float)(shift * shift * 2));
    float avg = 0.0;
    float n = 0.0f;
    for (int i = -shift; i <= shift; ++i)
        for (int j = -shift; j <= shift; ++j)
        {
            int row = idx[0] + i;
            int col = idx[1] + i;
            float scale = 1 - fast_math::sqrtf((float)((i * i) * (j * j))) / max;
            avg += data(row,col) * scale;
            n += 1.0f;
        }
    avg /= n;
    return avg;
}

OTHER TIPS

Yes, you need to annotate the function signature with restrict(amp) or restrict(cpu, amp) if you want to be able to call the same function from CPU code. See the MSDN docs on restrict.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top