Question

This is not a programming question as such, but related to, and something which I have been unable to find info about anywhere else. I hope it can be accepted.

I am attempting to port a C++ AMP application to run on a Surface 2 tablet. Since C++ AMP works on WinRT on the tablet and since it uses DirectX for GPU acceleration and since the Tegra processor does work with DirectX, I was expecting a performance boost from using C++ AMP on the tablet vs using the "cpu". I see about 80x speedup on desktop using the same code on GPU as compared to using a single CPU core and it is highly parallel.

As it turns out, an AMP solution on the tablet is only giving me the benefit of the Tegras four cores, but nothing more.

I have three C++ AMP devices which are: WARP, ref, CPU

The default is WARP.Ref is very much slower and CPU seems to crash right now.

Is the SIMD cores just not available for C++ AMP on a Tegra4 or do I have to do something special?

Was it helpful?

Solution

In addition to any GPU accelerators you will see the following accelerators if you enumerate the available accelerators using:

std::vector<accelerator> accls = accelerator::get_all()

Taken from the C++ AMP Book:

- accelerator::direct3d_ref The REF accelerator, also called the Reference Rasterizer or “Software Adapter” accelerator. It emulates a generic graphics card in software on the CPU to provide Direct3D functionality. It is used for debugging and will also be the default accelerator if no other accelerators are available. As the name suggests, the REF accelerator should be considered the de facto standard if you suspect a bug with your hardware vendor’s driver. Typically, your application will not want to use the REF accelerator because it is much slower than hardware-based accelerators and will be slower than just running a C++ implementation of your algorithm on the CPU.

- accelerator::cpu_accelerator The CPU accelerator can be used only for creating arrays that are accessible to the CPU and used for data staging. Your application can’t use this for executing C++ AMP code in the first release of C++ AMP. Further details on using the CPU accelerator to create staging arrays and host arrays are covered in Chapter 7, “Optimization.”

- accelerator::direct3d_warp The WARP accelerator, or Microsoft Basic Render Driver, allows the C++ AMP run time to run on the CPU. The WARP accelerator uses the WARP software rasterizer, which is part of the Direct3D 11 run time. The WARP accelerator uses multicore and data-parallel Single Instruction Multiple Data (SIMD) instructions to execute data-parallel code very efficiently on the CPU. Your application can use WARP as a fallback when no physical GPU is present. The WARP accelerator supports only single-precision math, so it can’t be used for fallback for kernels that require double precision or limited double-precision kernels. An overview of WARP can be found in “Windows Advanced Rasterization Platform (WARP) Guide” on MSDN: http://msdn.microsoft.com/en-us/library/gg615082.aspx.

So the behavior you see is pretty much as expected.

However, I guess what you really want to know is where is the Tegra 4 GPU? You would expect to see this as an accelerator option if the Surface 2 has a DirectX 11 driver. DX11 is required for C++ AMP.

The Surface 2 is based on the Tegra 4 (T40a2) SOC which appears to only support DX9.1.

On the DirectX side, Tegra 4’s GPU supports the Direct3D 9_1 feature level.

Taken from Nvidia's Tegra 4 GPU: Doubling Down On Efficiency. I was unable to find any confirmation of this on NVidia's site(s). It looks like you need a Tegra K1 to get DX11 support. Sorry to be the bearer of bad news.

On the plus side if you target your code at the default accelerator then other hardware that does have a DX11 capable GPU will automatically take advantage of it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top