Is there a task scheduler that is well-suited for floating-point calculations on processors with hyper-threading?

Question 1

To make IPP FFT effective and performant, I had to spin off as many tasks as I have cores per package times number of packages.

With NUMA nodes enabled, another scalability problem had to be addressed by enabling gcServer in the app config file. This seems to ensure that memory is allocated evenly on each of the NUMA nodes.

(With HT enabled...) With Intel TurboBoost enabled, I see less than 50% CPU utilization, often as low as 35%. Once TurboBoost is off, I see 50% CPU load consistently.

It's nice to see that, in .NET 4.5 Task Parallel Library, server-class performance tweaking is externalized. It would be even nicer to get it for free, always.

Details: tested on dual Xeon E5 v1 rig with Server 2k8 R2 SP1 Enterprise.

Question 2

This just isn't the way hyper-threading works. There is no such "assignment" and there is no concept of a "float-point thread per core". The core dynamically picks one of the available floating point execution units. There are several of them and they don't have the same capabilities. Having many execution engines is what makes hyper-threading work in the first place. Artificially trying to bypass logical cores that might be hyper-threaded doesn't make it faster, it makes it slower because you may well bypass the opportunity to use an otherwise idle engine.

I know you don't actually have this working yet from your other question. So this is very likely to be a case of premature optimization. Get it running first and find out if it is good enough. If lacking then move ahead by picking better hardware, a Xeon class processor for example.