Question

I’m having a problem writing some C++ AMP code. I have included a sample. It runs fine on emulated accelerators but crashes the display driver on my hardware (windows 7, NVIDIA GeForce GTX 660, latest drivers) but I can see nothing on wrong with my code.

Is there a problem with my code or is this a hardware/driver/complier issue?

#include "stdafx.h"

#include <vector>
#include <iostream>
#include <amp.h>

int _tmain(int argc, _TCHAR* argv[])
{
    // Prints "NVIDIA GeForce GTX 660"
    concurrency::accelerator_view target_view = concurrency::accelerator().create_view();
    std::wcout << target_view.accelerator.description << std::endl;

    // lower numbers do not cause the issue
    const int x = 2000;
    const int y = 30000;

    // 1d array for storing result
    std::vector<unsigned int> resultVector(y);
    Concurrency::array_view<unsigned int, 1> resultsArrayView(resultVector.size(), resultVector);

    // 2d array for data for processing 
    std::vector<unsigned int> dataVector(x * y);
    concurrency::array_view<unsigned int, 2> dataArrayView(y, x, dataVector);
    parallel_for_each(
        // Define the compute domain, which is the set of threads that are created.
        resultsArrayView.extent,
        // Define the code to run on each thread on the accelerator.
        [=](concurrency::index<1> idx) restrict(amp)
    {
        concurrency::array_view<unsigned int, 1> buffer = dataArrayView[idx[0]];
        unsigned int bufferSize = buffer.get_extent().size();

        // needs both loops to cause crash
        for (unsigned int outer = 0; outer < bufferSize; outer++)
        {
            for (unsigned int i = 0; i < bufferSize; i++)
            {
                // works without this line, also if I change to buffer[0] it works?
                dataArrayView[idx[0]][0] = 0;
            }
        }
        // works without this line
        resultsArrayView[0] = 0;
    });

    std::cout << "chash on next line" << std::endl; 
    resultsArrayView.synchronize();
    std::cout << "will never reach me" << std::endl; 

    system("PAUSE");
    return 0;
}
Was it helpful?

Solution

It is very likely that your computation exceeds permitted quantum time (default 2 seconds). After that time the operating systems comes in and restarts the GPU forcefully, this is called Timeout Detection and Recovery (TDR). The software adapter (reference device) does not have the TDR enabled, that is why the computation can exceed permitted quantum time.

Does your computation really require 3000 threads (variable x), each performing 2000 * 3000 (x * y) loop iterations? You can chunk your computation, such that each chunks takes less than 2 seconds to compute. You can also consider disabling TDR or exceeding the permitted quantum time to fit your need.

I highly recommend reading a blog post on how to handle TDRs in C++ AMP, which explains TDR in details: http://blogs.msdn.com/b/nativeconcurrency/archive/2012/03/07/handling-tdrs-in-c-amp.aspx

Additionally, here is the separate blog post on how to disable the TDR on Windows 8: http://blogs.msdn.com/b/nativeconcurrency/archive/2012/03/06/disabling-tdr-on-windows-8-for-your-c-amp-algorithms.aspx

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top