How can I read the bandwidth in use over the PCIe bus?

https://stackoverflow.com/questions/3759079

04-10-2019
|

Question

I'm working on a streaming media application that pushes a lot of data to the graphics card at startup. The CPU is doing very little at the point when the data is being pushed, it idles along at close to zero percent usage.

I'd like to monitor which machines struggle at pushing the initial data, and which ones can cope, in order that I can get to a minimum recommended spec for our customers hardware.

I've found that PCs with PCIe 1.1 x16 slots struggle with the initial data being pushed over the graphics card.

My development PC has a PCIe 2.0 x16 slot, and it has no problems with coping with the large amount of data being initially pushed to the graphics card.

I need numbers to prove (or disprove) my point.

What I'd like is to be able to determine:

Which slot type is the graphics card on? What is the speed of that slot? Gfx card name Gfx card driver version

But most importantly, the data flow over the PCIe slot - e.g. if I can show that the PCIe bus is being maxed out with data, I can point to that as the bottle neck.

I know that system memory speed is also a factor here, e.g. the data is being transferred from RAM, over the PCIe bus to the graphics card, so is there a way to determine the system memory speed also?

Finally, I write in unmanaged C++, so accessing .NET libraries is not an option.

Solution

Do you get errors pushing your massive amounts of data, or are you "simply" concerned with slow speed?

I doubt there's any easy way to monitor PCI-e bandwidth usage, if it's possible at all. But it should be possible to query the bus type the video adapter is connected to via WMI and/or SetupAPI - I have no personal experience or helpful links for either, sorry.

OTHER TIPS

For Nvidia GPUs, you can try using NvAPI_GPU_GetDynamicPstatesInfoEx:

Nvidia, through its GeForce driver, exposes a programming interface ("NVAPI") that, among other things, allows for collecting performance measurements. For the technically inclined, here is the relevant section in the nvapi.h header file:

FUNCTION NAME: NvAPI_GPU_GetDynamicPstatesInfoEx

DESCRIPTION: This API retrieves the NV_GPU_DYNAMIC_PSTATES_INFO_EX structure for the specified physical GPU. Each domain's info is indexed in the array. For example:

pDynamicPstatesInfo->utilization[NVAPI_GPU_UTILIZATION_DOMAIN_GPU] holds the info for the GPU domain. There are currently four domains for which GPU utilization and dynamic P-state thresholds can be retrieved: graphic engine (GPU), frame buffer (FB), video engine (VID), and bus interface (BUS).

Beyond this header commentary, the API's specific functionality isn't documented. The information below is our best interpretation of its workings, though it relies on a lot of conjecture.

The graphics engine ("GPU") metric is expected to be your bottleneck in most games. If you don't see this at or close to 100%, something else (like your CPU or memory subsystem) is limiting performance.

The frame buffer ("FB") metric is interesting, if it works as intended. From the name, you'd expect it to measure graphics memory utilization (the percentage of memory used). That is not what this is, though. It appears, rather, to be the memory controller's utilization in percent. If that's correct, it would measure actual bandwidth being used by the controller, which is not otherwise available as a measurement any other way.

We're not as interested in the video engine ("VID"); it's not generally used in gaming, and registers a flat 0% typically. You'd only see the dial move if you're encoding video through ShadowPlay or streaming to a Shield.

The bus interface ("BUS") metric refers to utilization of the PCIe controller, again, as a percentage. The corresponding measurement, which you can trace in EVGA PrecisionX and MSI Afterburner, is called "GPU BUS Usage".

We asked Nvidia to shed some light on the inner workings of NVAPI. Its response confirmed that the FB metric measures graphics memory bandwidth usage, but Nvidia dismissed the BUS metric as "considered to be unreliable and thus not used internally".

We asked AMD if it had any API or function that allowed for similar measurements. After internal verification, company representatives confirmed that they did not. As much as we would like to, we are unable to conduct similar tests on AMD hardware.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow