In general, you are most efficient when using only one thread. Making stuff parallel will inevidently introduce cost. The gain in throughput will only come when the additional amount of work you can do in parallel overweights this cost.
Now, Amdahl's law illustrates the theoretical gain in throughput in relation to how much of your work consists of stuff that can be parallelized / cannot be parallelized. For example, if only 50% of your task is parallelizable, you can only get x2 increase in throughput regardless of how many threads you throw at the problem. Note that the chart you see inside the link ignores the cost of adding threads. In reality, native OS threads do add quite a bit of cost and esp. when a lot of them are trying to access a shared resource.
In your case, when you used only one socket, most of your work was not parallelizable. Hence using a single thread gave superior performance and adding threads made it worse because of the costs they added. In your second experiment, you increased the work that can be parallelized by using more than one socket. Hence you gained in throughput despite adding some cost by using threads.