문제

I want to calculate the theoretical peak performance of graphics hardware. Well, actually I want to understand the calculation.

Example with a AMD Radeon HD 6670: The AMD Accelerated Parallel Processing Programming Guide (http://developer.amd.com/download/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf) tells me in the middle of page 6-42 to take the number of Stream Cores (96), multiply it by the number of operations per cycle for each Stream Core (let's take an ADD with Single Precision, which would be 5) and multiply that by the core clock (800 MHz). That results to:

96 * 5 FLOPS * 800MHz = 384,000 MFLOPS = 384 GFLOPS

The very same document tells me on page D-4 that this particular device has a peak throughput of 768 GFLOPS, which is twice of what I just calculated. Wikipedia and the AMD homepage state the same.

So my question is: Where am I missing the factor of two?

도움이 되었습니까?

해결책

I am not sure about AMD hardware, but I remember that NVIDIA counted MAD (multiply-add) operation as two flops. Since MADs are performed in one cycle, the theoretical performance was multiplied by two.

다른 팁

480 processing elements * 2 operations per cycle(single addition pipeline + single multiplication pipeline per element) * 800MHz = 768 GFLOPS

When the code has too many levels of branching, it drops to 1-4 shader per compute unit which means 6-24 of them and this translates to as low as 10-40 GFlops which is even slower than a single cpu core.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top