There are a few architectural differences that I've noticed when researching for bitcoin mining rigs. Here is an article from Ars Technica that tries to explain some of the differences.
Basically you have to remember that a GPU is NOT general purpose in the sense that a CPU is. You need to define your application first then check which architecture is better for it. I am unsure how TFlop values are calculated but I've got a hitch that those values are more of marketing value than technical, unless you can compare very similar models together. Even in the realm of super computer clusters tons of factors impact the actual throughput of the system beside raw power per cpu, like interconnections wiring and memory accesses for example.