Why does hyper-threading benefit my algorithm?

Question 1

Hyperthreading effectively gives you more cores, for integer operations - it allows two sets of integer operations to run in parallel on a single physical core. It doesn't help floating point operations as far as I'm aware, but presumably the SHA-1 code is primarily integer operations, hence the speed-up.

It's not as good as having 4 real physical cores, of course - but it does allow for a bit more parallelism.

Question 2

Disable HT in BIOS and do the test again for 2 threads. HT gives a little speedup only when one virtual core uses CPU instruction set and second executes instructions which uses FPU registers.

Question 3

SMT/Hyperthreading allows multiple threads (usually two), on the same physical core, to execute -- one is typically waiting for the other to encounter a stall, and then the thread which is executing will switch.

Stalls happen -- mostly with cache misses. Even if you are not traversing the same memory, there's no guarantee that said memory will already be in the cache (thus inducing a stall when it is accessed), or that it will not map to the same line of the cache that another thread is mapping memory to.

Thus, two threads will almost always benefit from SMT/hyperthreading, unless the data they traverse is already present in the cache. That's actually an unusual scenario -- an algorithm typically needs to prefetch its data, and additionally not use more than the cache can hold, or not overwrite memory other threads are trying to cache -- which requires knowledge of other threads on the core. That's not usually possible, because it's abstracted away by the OS.

Most algorithms are not tuned to that extent, particularly since its only usually console-exclusive games, or other hardware exclusive applications, which can guarantee a certain minimum spec for the cache, and more importantly, have intimate knowledge of other threads which are running concurrently on the same core. This is also one of the major reasons larger caches benefit modern CPU performance.