Domanda

I need CAS functions to use in a context of multiple threads running on the same CPU (assume that all threads are statically glued to selected CPU, via SetThreadAffinityMask).

InterlockedCompareExchange generates LOCK CMPXCHG. The LOCK part comes with side effects such as a cache miss, a bus lock and a potential for contention with other CPU, all of which are nice, but feel like an extravagant excess given the circuimstances. Since the competing threads run on the same CPU, I assume the LOCK can be dropped, and I further assume it should result in improved performance.

So this is my first question - do I assume correctly?

--

I know how to generate CMPXCHG with inline assembly for 32-bit version. Also, as per this SO thread I know how to do for 64-bit version too, but as a function call.

What I don't understand, and this is my second question, is how to generate an inlined version of it.

--

Thanks.

È stato utile?

Soluzione

Not to answer my own question, but to describe a workaround, of sorts.

For CAS on boolean variables, it's possible to fall back to _bittestandset, which is slower than CMPXCHG, but has an intrinsic form in VS2010.

Altri suggerimenti

This is really more of a comment, but the space is a little too limited...

I doubt* you'll get the CMPXCHG instruction on its own without the use of assembly. If the region is that critical, use the Interlocked intrinsics, disassemble the output, remove the LOCK override prefix and link that in (I'd do this for both 32 and 64bit variants, as inlined ASM is less than optimal in MSVC, as its always treated as unsafe, causing extra protection cruft to be inserted, which may be worse than calling an external version. On the plus side it'll also give you a more uniform code layout).

I'd also recommend you profile both solutions, with an without the LOCK, as most newer Intel CPU's implement cache-level locks, that greatly reduce the performance impact of the lock (Chapter 8 of the Intel Developer Manual provides a healthy bit of insight into the exact effects of bus locking).

*By "doubt" I mean: it doesn't exist as an explicit intrinsic, and using compiler coercion tricks is very brittle, not that I know of any for coercing the emission of XCHG or CMPXCHG (with the exception of XCHG (E)AX,(E)AX, used as an alignment NO-OP).

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top