Some problems like solving pi are more suited to parallelization and hyperthreading can actually give you a speed up. When you are dealing with a heavy memory problem like you are, Hyperthreading can't help and can actually hurt. Check out "pipelining" in CPU architecture.
There are not many practical problems for which you can get a 2x speedup by using 2-cpus. The more cpus, the more overhead. In your test case algorithm, I suspect cores are having to wait for the memory subsystem. If you tweak the memory requirements, you will see an increase in performance (and utilization) as you move the memory requirements closer to the CPU cache-size.