Question

I just read this:

Get the status of a std::future

Since the functionality of Concurrency::completion_future appears to mimick std::future I thought I could do something similar, but this relatively simple example fails:

#include <assert.h>
#include <chrono>
#include <iostream>
#include <amp.h>

int main()
{
    using namespace Concurrency;
    int big = 1000000; // this should take a while to send back to the host
    array_view<int> av(big);

    parallel_for_each(extent<1>(big), [=](index<1> idx) restrict(amp)
    {
        av[idx] = idx[0];
    });
    int i = 0;
    completion_future future = av.synchronize_async();

    // this should be false; how could it instantly sent back so much data?
    bool const gpuFinished = future.wait_for(std::chrono::seconds(0)) == std::future_status::ready;

    assert(!gpuFinished); // FAIL! why?

    future.wait();

    system("pause");
}

Why would that assert fail?

Was it helpful?

Solution

The behavior observed in OP is correct.

array_view<int> av(big) creates an array_view without data source, while av.synchronize_async() synchronizes modifications to the data source. Therefore for array_view without data source it is by definition no-op. By extension it is also not forcing the execution of the preceding parallel_for_each.

If the intention is to synchronize the data to the CPU memory, in this case it needs to be requested explicitly with av.synchronize_to_async(accelerator(accelerator::cpu_accelerator).default_view). Of course the returned completion_future becomes ready only when the preceding parallel_for_each and (optional) copy operation finish.

Replacing the former synchronization call with the latter makes the assertion successful, keeping in mind it may still fail (by design) on systems with CPU shared memory, or in some rare timings.

OTHER TIPS

Disclaimer: I'm not an expert in AMP.

AFAIK, array_view doesn't represent anything by itself. It is just a view you should tie to something. So your code, basically, doesn't make sense to me. You don't have any backend memory on CPU with which you need to synchronize.

Try the following code:

#include <assert.h>
#include <chrono>
#include <iostream>
#include <amp.h>
#include <numeric>

int main()
{
    using namespace Concurrency;
    using namespace std;
    int big = 100000000; // this should take a while to send back to the host
    vector<int> vec(big);
    iota(begin(vec), end(vec), 0);
    array_view<int, 1> av(big, vec);

    parallel_for_each(Concurrency::extent<1>(big), [=](index<1> idx) restrict(amp)
    {
        av[idx] = av[idx] * av[idx];
    });
    int i = 0;
    completion_future future = av.synchronize_async();

    // this should be false; how could it instantly sent back so much data?
    bool const gpuFinished = future.wait_for(std::chrono::seconds(0)) == std::future_status::ready;

    assert(!gpuFinished); // FAIL! why?

    future.wait();
    std::cout << vec[5];
}

It's just a modification of yours which works as expected.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top