Is CPU time relevant to Hyperthreading?

Question 1

It's quite possibly an artefact of how CPU time is measured. A trivial example, if you run a 100 MHz CPU and a 3 GHz CPU for one second each, each will report that it ran for one second. The second CPU might do 30 times more work, but it takes one second.

With hyperthreading, a reasonable (not quite accurate) model would be that one core can run either one task at lets say 2000 MHz, or two tasks at lets say 1200 MHz. Running two tasks it does only 60% of the work per thread, but 120% of the work for both threads together, a 20% improvement. But if the OS asks how many seconds of CPU time was used, the first will report "1 second" after each second on real time, while the second will report "2 seconds".

So the reported CPU time goes up. If it less than doubles, overall performance is improved.

Question 2

Quick question - are you running the genuine time program /usr/bin/time, or the built in bash command of the same name? I'm not sure that matters, they look very similar.

Looking at your table of numbers I sense that the processed data set (ie input plus all the out data) is reasonably large overall (bigger than L2 cache), and that the processing per data item is not that lengthy.

The numbers show a nearly linear improvement from 1 to 2 cores, but that is tailing off significantly by the time you're using 4 cores. The hyoerthreaded cores are adding virtually nothing. This means that something shared is being contended for. Your program has free running threads, so that thing can only be memory (L3 cache and main memory on the i7).

This sounds like a typical example of being I/O bound rather than compute bound, the I/O in this case being to/from L3 cache and main memory. L2 cache is 256k, so I'm guessing that the size of your input data plus one set of results and all intermediate arrays is bigger than 256k.

Am I near the mark?

Generally speaking when considering how many threads to use you have to take shared cache and memory speeds and data set sizes into account. That can be a right be a right bugger because you have to work it out at run time, which is a lot of programming effort (unless your hardware config is fixed).