Question

I'm having a hard time understanding how the theoretical Instructions per Cycle (IPC) for a Fermi architecture nvidia GPU is 2, according to http://on-demand.gputechconf.com/gtc-express/2011/presentations/Inst_limited_kernels_Oct2011.pdf page 9.

According to section 5.4.1 of the programming guide (http://docs.nvidia.com/cuda/cuda-c-programming-guide/#arithmetic-instructions) for 32-bit floats, there can be 32 fp32-instructions/SM/clock cycle.

How do the two quantities relate?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top