Question

I recently learned that sometimes people will lock specific processes or threads to specific processors or cores, and it's thought that this manual tuning will best distribute the load. This is a bit counter-intuitive to me -- I would think the OS scheduler would be able to make a better decision than a human about how to spread the load. I could see it being true for older operating systems that perhaps weren't aware of issues like their being more latency between specific pairs of cores, or shared cache between one pair of cores but not another pair. But I assume 'modern' OSs like Linux, Solaris 10, OS X, and Vista should have schedulers that know this information. Am I mistaken about their capabilities? Am I mistaken that it's a problem the OS can actually solve? I'm particularly interested in the answer for Solaris and Linux.

The consequence is whether or not I need to inform users of my (multithreaded) software of how they might consider balancing on their box.

Was it helpful?

Solution

First of all, 'Lock' is not a correct term to describe it. 'Affinity' is more suitable term.

In most case, you don't need to care about it. However, in some cases, manually setting CPU/Process/Thread affinity could be beneficial.

Operating systems are usually oblivious to the details of modern multicore architecture. For example, say we have 2-socket quadcore processors, and the processor supports SMT(=HyperThreading). In this case, we have 2 processors, 8 cores, and 16 hardware threads. So, OS will see 16 logical processors. If an OS does not recognize such hierarchy, it is highly likely to lose some performance gains. The reasons are:

  1. Caches: in our example, two different processors (installed on two different sockets) are not sharing any on-chip caches. Say an application has 4 busy-running threads and a lot of data are shared by threads. If an OS schedules the threads across the processors, then we may lose some cache locality, resulting in performance lose. However, the threads are not sharing much data (having distinct working set), then separating to different physical processors would be better by increasing effective cache capacity. Also, more tricky scenario could be happen, which is very hard for OS to be aware of.

  2. Resource conflict: let's consider SMT(=HyperThreading) case. SMT shares a lot of important resources of CPU such as caches, TLB, and execution units. Say there are only two busy threads. However, an OS may stupidly schedule these two threads on two logical processors from the same physical core. In such case, a significant resources are contended by two logical threads.

One good example is Windows 7. Windows 7 now supports a smart scheduling policy that consider SMT (related article). Windows 7 actually prevents the above 2. case. Here is a snapshot of task manger in Windows 7 with 20% load on Core i7 (quadcore with HyperThreading = 8 logical processors):

alt text
(source: egloos.com)

The CPU usage history is very interesting, isn't? :) You may see that only a single CPU in pairs is utilized, meaning Windows 7 avoids scheduling two threads on a same core simultaneously as possible. This policy will definitely decrease the negative effects of SMT such as resource conflict.

I'd like to say OS are not very smart to understand modern multicore architecture where a lot of caches, shared last-level cache, SMT, and even NUMA. So, there could be good reasons you may need to manually set CPU/process/thread affinity.

However, I won't say this is really needed. Only when you fully understand your workload patterns and your system architecture, then try it on. And, see the results whether your try is effective.

OTHER TIPS

For general-purpose applications, there is no reason to set the CPU affinity; you should just allow the OS scheduler to choose which CPU should run the process or thread. However, there are instances where it is necessary to set the CPU affinity. For example, in real-time systems where the cost of migrating a thread from one core to another (which can happen at any time if the CPU affinity has not been set) can introduce unpredictable delays that can cause tasks to miss their deadlines and which preclude real-time guarantees.

You can take a look at this article about a multi-core aware implementation of real-time CORBA that, among other things, had to set the CPU affinity so that CPU migration could not result in missed deadlines.

The paper is: Real-Time Performance and Middleware for Multiprocessor and Multicore Linux Platforms

For applications designed with parallelism and multiple cores in mind, OS-default thread affinity is sometimes not enough. There are many approaches to parallelism, but so far all require involvement of the programmer and knowledge - at some level at least - of the architecture on which the solution will be mapped. This includes the machines, CPU's and threads that are involved.

This is an actively researched subject, and there is an excellent course on MIT's OpenCourseWare that delves into these issues: http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-189January--IAP--2007/CourseHome/

Well something many people haven't thought here is the idea of forbidding two processes to run on the same processor (socket). It might be worth to help the system to bound different heavily used processes to different processors. This can avoid contention if the scheduler is not clever enough to figure it out itself.

But this is more a system admin task then one for the programmers. I have seen optimizations like this for a few high performance database servers.

Most modern operating systems will do an effective job of allocating work between cores. They also attempt to keep threads running on the same core, to get the cache benefits you mentioned.

In general, you should never be setting your thread affinity unless you have a very good reason to. You don't have as good an insight as the OS into the other work that threads on the system are doing. Kernels are constantly being updated based on new processor technology (single CPU per socket to hyper threading to multiple cores per sockets). Any attempt by you to set hard affinity may backfire on future platforms.

This article from MSDN Magazine, Using concurrency for scalability, gives a good overview of multithreading on Win32. Regarding CPU affinity,

Windows automatically employs so-called ideal processor affinity in an attempt to maximize cache efficiency. For example, a thread running on CPU 1 that gets context switched out will prefer to run again on CPU 1 in the hope that some of its data will still reside in cache. But if CPU 1 is busy and CPU 2 is not, the thread could be scheduled on CPU 2 instead, with all the negative cache effects that implies.

The article also warns that CPU affinity shouldn't be manipulated without a deep understanding of the problem. Based on this information, my answer to your question would be No, except for very specific, well-understood scenarios.

I am not even sure you can pin processes to a specific CPU on linux. So, my answer is "NO" - let the OS handle it, it's smarter then you most of the time.

Edit: It seems that on win32 you have some control over which CPU family are you going to run this process. Now I only wait for someone to prove me wrong also on linux/posix ...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top