Question

I am trying to use C++ AMP to execute a long running kernel on the GPU. This requires using DirectX to create a device which won't timeout. I am setting the flag but it is still triggering Timeout Detection Recovery. I have a dedicated Radeon HD 7970 in my box without a monitor plugged into it. Is there anything else I need to do to keep Windows 8 from canceling my kernel before it is finished?

#include <amp.h>
#include <amp_math.h>
#include <amp_graphics.h>
#include <d3d11.h>
#include <dxgi.h>

#include <vector>
#include <iostream>
#include <iomanip>
#include "amp_tinymt_rng.h"
#include "timer.h"
#include <assert.h>

#pragma comment(lib, "d3d11")
#pragma comment(lib, "dxgi")

//Inside Main()
    unsigned int createDeviceFlags = D3D11_CREATE_DEVICE_DISABLE_GPU_TIMEOUT;
    ID3D11Device * pDevice = nullptr;
    ID3D11DeviceContext * pContext = nullptr;
    D3D_FEATURE_LEVEL targetFeatureLevels = D3D_FEATURE_LEVEL_11_1;
    D3D_FEATURE_LEVEL featureLevel;
    auto hr = D3D11CreateDevice(pAdapter, 
                            D3D_DRIVER_TYPE_UNKNOWN, 
                            nullptr, 
                            createDeviceFlags, 
                            &targetFeatureLevels, 
                            1, 
                            D3D11_SDK_VERSION, 
                            &pDevice, 
                            &featureLevel, 
                            &pContext);

    if (FAILED( hr) || 
        ( featureLevel != D3D_FEATURE_LEVEL_11_1))
    { 
        std:: wcerr << "Failed to create Direct3D 11 device" << std:: endl; 
        return 10; 
    }

accelerator_view noTimeoutAcclView = concurrency::direct3d::create_accelerator_view(pDevice);
wcout << noTimeoutAcclView.accelerator.description;

//Setup kernal here
concurrency::parallel_for_each(noTimeoutAcclView, e_size, [=] (index<1> idx) restrict(amp) {
   //Execute kernel here
});
Was it helpful?

Solution

Your snippet looks good, the problem has to be elsewhere. Here are few ideas:

  • Please double check all parallel_for_each invocations and make sure they all use accelerator_view with the device that you created with this snippet (explicitly pass accelerator_view as first argument to parallel_for_each).

  • If possible reduce the problem size and see if your code runs without TDR, perhaps something else is causing a TDR (e.g. driver bugs are common cause of TDRs). Once you will know that your algorithm runs correctly for smaller problem you can go back to searching why is TDR triggered for larger problem size.

  • Disable TDR completely (see MSDN article on TDR registry keys) and see if your large problem set ever completes, if so go back to first point. This will indicate that your code runs on accelerator_view that has TDR enabled.

Good luck!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top