How does x86 pause instruction work in spinlock and can it be used in other scenarios?

https://stackoverflow.com/questions/4725676

12-10-2019
|

Question

pause instruction is commonly used in the loop of testing spinlock, when some other thread owns the spinlock, to mitigate the tight loop. It's said that it is equivalent to some NOP instructions. Could somebody tell me how exactly it works for spinlock optimization? It seems to me that even the NOP instructions are a waste of CPU time. Will they decrease CPU usage?

Another question is that could I use pause instruction for other similar purposes. For example, I have a busy thread which keeps scanning some places (e.g. a queue) to retrieve new nodes; however, sometimes the queue is empty and the thread is justing wasting cpu time. sleep the thread and wake it up by other threads may be an option, however the thread is critical, so I don't want to make it sleep. Could pause instruction work for my purpose to mitigate the CPU usage? Currently it uses 100% cpu of a physical core?

Thanks.

Solution

PAUSE notifies the CPU that this is a spinlock wait loop so memory and cache accesses may be optimized. See also pause instruction in x86 for some more details about avoiding the memory-order mis-speculation when leaving the spin-loop.

PAUSE may actually stop CPU for some time to save power. Older CPUs decode it as REP NOP, so you don't have to check if its supported. Older CPUs will simply do nothing (NOP) as fast as possible.

Update: I don't think it's a good idea to use PAUSE in queue checking unless you are going to make your queue spinlock-like (and there is no obvious way to do it).

Spinning for a very long time is still very bad, even with PAUSE.

OTHER TIPS

A processor suffers a severe performance penalty when exiting the loop because it detects a possible memory order violation. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops. An additional function of the PAUSE instruction is to reduce the power consumed by Intel processors.

[source: Intel manual]

Intel does only recommend using the PAUSE instructions when the spin-loop is very short.

As I understood from your questions, the waits in your case are very long. In this case, spin-loops are not recommended.

You wrote that you have a "thread which keeps scanning some places (e.g. a queue) to retrieve new nodes".

In such a case, Intel recommends using synchronization API functions of your operating system. For example, you can create an event when a new node appears in a queue, and just wait for this event using the WaitForSingleObject(Handle, INFINITE). The queue will trigger this event whenever a new node will appear.

According to the Intel Optimization Manual, the PAUSE instruction is typically used with software threads executing on two logical processors located in the same processor core, waiting for a lock to be released. Such short wait loops tend to last between tens and a few hundreds of cycles (i.e. 20-500 CPU cycles), so performance-wise it is more beneficial to wait while occupying the CPU than yielding to the OS.

500 CPU cycles on a 4500 MHz Core i7 7700K processor is 0.0000001 seconds, i.e. 1/10000000th of a second: the CPU can make 10 million times per second this 500 CPU cycles loop.

As you see, this PAUSE instruction is for really short periods of time.

On the other hand, each call to an API function like Sleep() experiences the expensive cost of a context switch, which can be 10000+ cycles; it also suffers the cost of ring 3 to ring 0 transitions, which can be 1000+ cycles.

If there are more threads then the processor cores (multiplied to hyperthreading feature, if present) are available, and a thread will get switched to another one in the middle of a critical section, waiting for the critical section from another thread may really take looong, at least 10000+ cycles, so the PAUSE instruction will be futile.

Please see this articles for more information:

When the wait loop is expected to last for thousands of cycles or more, it is preferable to yield to the operating system by calling one of the OS synchronization API functions, such as WaitForSingleObject on Windows OS.

As a conclusion: in your scenario, the PAUSE instruction won't be the best choice, since your waiting time is long while the PAUSE is intended for very short loops. PAUSE is just 131 cycles SkyWell or later processors. For example, it is just or 31.19ns on Intel Core i7-7700K CPU @ 4.20GHz Kaby Lake.

On earlier processors, like Haswell, i has about 9 cycles. It is 2.81ns on Intel Core i5-4430 @ 3GHz. So, for the long loops, it's better to relinquish control to other threads using the OS synchronization API functions than to occupy CPU with the PAUSE loop.

The PAUSE instruction also appears to be used in hyper-threading processors to mitigate performance impact on other hyper threads, presumably by relinquishing more CPU time to them.

The following Intel article outlines this, and not surprisingly recommends avoiding busy wait loops on such processors: https://software.intel.com/en-us/articles/long-duration-spin-wait-loops-on-hyper-threading-technology-enabled-intel-processors

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow

How does x86 pause instruction work in spinlock *and* can it be used in other scenarios?

How does x86 pause instruction work in spinlock and can it be used in other scenarios?