Setting Ideal size of Thread Pool [duplicate]

Question 1

Ok. Ideally assuming your threads do not have locking such that they do not block each other (independent of each other) and you can assume that the work load (processing) is same, then it turns out that, have a pool size of Runtime.getRuntime().availableProcessors() or availableProcessors() + 1 gives the best results.

But say, if threads interfere with each other or have I/O inlvolved, then Amadhal's law explains pretty well. From wiki,

Amdahl's law states that if P is the proportion of a program that can be made parallel (i.e., benefit from parallelization), and (1 − P) is the proportion that cannot be parallelized (remains serial), then the maximum speedup that can be achieved by using N processors is

Amadhal law

In your case, based upon the number of cores available, and what work they precisely do (pure computation? I/O? hold locks? blocked for some resource? etc..), you need to come up with the solution based upon above parameters.

For example: Some months back I was involved with collecting data from numeral web-sites. My machine was 4-core and I had a pool size of 4. But because the operation was purely I/O and my net speed was decent, I realized that I had best performance with a pool size of 7. And that is because, the threads were not fighting for computational power, but for I/O. So I could leverage the fact that more threads can contest for core positively.

PS: I suggest, going through the chapter Performance from the book - Java Concurrency in Practice by Brian Goetz. It deals with such matters in detail.

Question 2

So now I am trying to understand from architecture point of view what does number of threads means here?

Each thread has its own stack memory, program counter (like a pointer to what instruction executes next) and other local resources. Swapping them out hurts latency for a single task. The benefit is that while one thread is idle (usually when waiting for i/o) another thread can get work done. Also if there are multiple processors available, they can run in parallel if there is no resource and/or locking contention between the tasks.

And how to decide what is the optimal number of threads I should choose?

The trade-off between swap-price versus the opportunity to avoid idle time depends on the little details of what your task looks like (how much i/o, and when, with how much work between i/o, using how much memory to complete). Experimentation is always the key.

And if I am using more number of threads then what will happen?

There will usually be linear-ish growth in throughput at first, then a relative flat part, then a drop (which may be quite steep). Each system is different.

Question 3

Looking at Amdahl’s law is fine, especially if you know exactly how big P and N are. Since this will never really happen, you could monitor the performance (which you should do anyway) and increase/decrease you thread pool size to optimize whatever performance metrics are important to you.