Hyper-threading and gaming (and other computing applications)?

Question 1

First, it should be recognized that hyperthreading is an Intel marketing term labelling Switch-on-Event MultiThreading (on Itanium) and Simultaneous MultiThreading (on x86). SoEMT is primarily beneficial in hiding high latency events such as last level cache misses, is easier to implement, and is friendlier to VLIW-like scheduling. SoEMT is also a better fit for a small L1 (given a somewhat fast L2) than SMT since cache contention is moved more to L2 or L3 (thousands of accesses between thread switches) which can better handle contention given their greater capacity and higher associativity. SMT can be useful in hiding smaller latencies like branch resolution delay or L2 cache hits and provides instruction level parallelism, but introduces more intense contention for resources.

(There is also a difference between disabling hyperthreading and not using hyperthreading. Disabling hyperthreading might provide a small performance benefit in that some shareable resources will be used even by an inactive but enabled thread and some partitioned resources may still use a small amount of power, but the primary benefit would be in preventing the OS from making disruptive scheduling decisions.)

For "normal" code, the available thread-level parallelism may well be lower than the number of cores available. In that case, a modern OS typically will not use the hardware multithreading since it recognizes that a full core has more performance than a core shared by more than one thread. (Sharing a core can theoretically improve performance in special cases where using L1 to communicate between threads is unusually helpful. In addition, waking an inactive thread on an active core is much faster and requires less energy than waking up a core, so using multithreading might be helpful for energy efficiency in some special cases.)

HPC codes tend to be the worst case for SMT. HPC code is more likely to be friendly to static scheduling. This means that the latency hiding benefits of SMT tend to be minimized. (Similarly, HPC code tends to benefit less from out-of-order execution.) HPC code also tends to be constrained by memory bandwidth rather than memory latency. SMT can increase the bandwidth demand per unit of execution (by increasing cache misses) and reduce the actual achieved memory bandwidth by contention at the memory controller. (DRAM is not friendly to random access; such causes excessive refresh and row active cycles.) SMT may also cause the number of data streams that are active to exceed the hardware's support for prefetching. HPC code is also more likely to be blocked according to cache sizes assuming one thread per core; in such cases SMT will produce significant cache thrashing.

Disabling hyperthreading may also be friendlier to gang-scheduled operation, which is common in HPC. If only some of the cores are using multithreading, those cores might have higher performance per core yet would have lower performance per thread; that forces other cores to idly wait for the slowed threads to complete. (HPC systems may have dedicated OS cores and spare cores to avoid similar problems, where OS activity would slow down one core/thread and force hundreds of others to wait or where a failed core could cause, e.g., a 16-thread gang scheduled program to run 15 threads and then one thread, doubling execution time.)

(In theory, SMT could be used in HPC to reduce register pressure in some optimized loops since the effective latency of operations like FMADD in a dual threaded core may be viewed as roughly being halved. Since compilers generally use a fixed latency for scheduling [SMT is treated as a transparent feature], exploiting this feature is not generally practical even when it could be beneficial.)

Rather like out-of-order execution, SMT is most beneficial for irregular code. (OoO looks ahead in a single code stream for instruction level and memory level parallelism; SMT looks "sideways" across threads for such parallelism.) If branch mispredictions and cache misses are common, SMT can use existing thread-level parallelism to hide such latencies (the cost of a branch misprediction is largely in the latency of resolution).

The benefit from SMT varies by workload and by the specific hardware. A deeply pipelined in-order microarchitecture like the initial Intel Atom benefits more from SMT than a shallower pipelined OoO microarchitecture would (latencies, especially branch resolution latency, being generally higher with longer pipelines and OoO providing some parallelism that would otherwise be used by SMT's thread-level parallelism).

Enabled hyperthreading may also have the disadvantage of increasing the number of threads used by an application where performance scaling with increased thread count is sufficiently sublinear that the lower performance per thread with hyperthreading would result in a net loss of performance. E.g., if two-thread-per-core hyperthreading provided a 30% increase in per core performance and doubling thread count increased performance by 50%, then total performance would decrease by 2.5%.

The standard advice of "when in doubt, measure" obviously applies.

Question 2

Obviously some people don't understand some things. I have done so, here is what I copied froma site:

Depending on when you last bought a computer, you may remember Hyper-Threading as a feature that Intel introduced and then discontinued. This could understandably leave a sour taste in your mouth – why would Intel discontinue it if it wasn’t trouble? The truth isn’t so grim. Hyper-Threading was for a time made available on certain Intel Pentium 4 and Intel Xeon processors. It was discontinued not because the feature itself was bad, but rather because the processor that used it turned out to be a bit of a misstep for other reasons. The Pentium 4 architecture was a minor disaster for Intel because it was incapable of going the direction Intel hoped (Intel wanted to have Pentium 4 processors with clock speeds of up to 10 GHz). As a result, Intel jumped back to designing processors based on the Pentium Pro family tree. Hyper-Threading was gone, but not forgotten. Intel eventually found the time and resources to integrate it into another new processor architecture - Nehalem. This is the architecture that is the basis for all current Intel Core i3, i5 and i7 processors.

Source: http://www.makeuseof.com/tag/hyperthreading-technology-explained/