Can a shared ready queue limit the scalability of a multiprocessor system?
-
06-07-2019 - |
Question
Can a shared ready queue limit the scalability of a multiprocessor system?
Solution
Simply put, most definetly. Read on for some discussion.
Tuning a service is an art-form or requires benchmarking (and the space for the amount of concepts you need to benchmark is huge). I believe that it depends on factors such as the following (this is not exhaustive).
- how much time an item which is picked up from the ready qeueue takes to process, and
- how many worker threads are their?
- how many producers are their, and how often do they produce ?
- what type of wait concepts are you using ? spin-locks or kernel-waits (the latter being slower) ?
So, if items are produced often, and if the amount of threads is large, and the processing time is low: the data structure could be locked for large windows, thus causing thrashing.
Other factors may include the data structure used and how long the data structure is locked for -e.g., if you use a linked list to manage such a queue the add
and remove
oprations take constant time. A prio-queue (heaps) takes a few more operations on average when items are added
.
If your system is for business processing you could take this question out of the picture by just using:
- A process based architecure and just spawning multiple producer consumer processes and using the file system for communication,
- Using a non-preemtive collaborative threading programming language such as stackless python, Lua or Erlang.
also note: synchronization primitives cause inter-processor cache-cohesion floods which are not good and therefore should be used sparingly.
The discussion could go on to fill a Ph.D dissertation :D
OTHER TIPS
A per-cpu ready queue is a natural selection for the data structure. This is because, most operating systems will try to keep a process on the same CPU, for many reasons, you can google for.What does that imply? If a thread is ready and another CPU is idling, OS will not quickly migrate the thread to another CPU. load-balance kicks in long run only.
Had the situation been different, that is it was not a design goal to keep thread-cpu affinities, rather thread migration was frequent, then keeping separate per-cpu run queues would be costly.