Question

What counts more when CUDA kernel speed execution is of vital importance? The frequency of the cores or the number of the SMs?

I can choose between a Quadro K5000 and a Gtx 670 and I cannot decide. Memory seems enough in both cases but the quadro has more SMs while the Gtx has a higher clock rate (I suppose this value is per-core).

Was it helpful?

Solution

Depends in what you are trying to execute. Will your program make use of all the cores of the Quadro? If not, the Gtx will be faster. If it does and the Gtx would need more than 1 grid, you should do the math, but probably the Quadro will be faster.

OTHER TIPS

The Quadro K5000 and GTX670 are both based on the same GK104 silicon. The Quadro has 8 SMs active instead of 7 on the GTX470. The GTX670 runs at 915Mhz, where as the Quadro runs at 706Mhz, so overall throughput is better on the GTX670. Quadro has 172GB/s bandwidth vs. the 192GB/s from the GTX670 so bandwidth is also better on the GTX670.

Go with the GTX670 if the decision is based purely on speed. You might also want to consider the GTX780 or Titan if budget allows.

The question of which GPU card to select, and why, is covered in chapter 11 of this textbook.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top