Question

How do you implement an efficient and thread safe reference counting system on X86 CPUs in the C++ programming language?

I always run into the problem that the critical operations not atomic, and the available X86 Interlock operations are not sufficient for implementing the ref counting system.

The following article covers this topic, but requires special CPU instructions:

http://www.ddj.com/architect/184401888

Was it helpful?

Solution

Nowadays, you can use the Boost/TR1 shared_ptr<> smart pointer to keep your reference counted references.

Works great; no fuss, no muss. The shared_ptr<> class takes care of all the locking needed on the refcount.

OTHER TIPS

In VC++, you can use _InterlockedCompareExchange.

do
   read the count
   perform mathematical operation
   interlockedcompareexchange( destination, updated count, old count)
until the interlockedcompareexchange returns the success code.

On other platforms/compilers, use the appropriate intrinsic for the LOCK CMPXCHG instruction that MS's _InterlockedCompareExchange exposes.

Strictly speaking, you'll need to wait until C++0x to be able to write thread-safe code in pure C++.

For now, you can use Posix, or create your own platform independent wrappers around compare and swap and/or interlocked increment/decrement.

Win32 InterlockedIncrementAcquire and InterlockedDecrementRelease (if you want to be safe and care about platforms with possible reordering, hence you need to issue memory barriers at the same time) or InterlockedIncrement and InterlockedDecrement (if you are sure you will stay x86), are atomic and will do the job.

That said, Boost/TR1 shared_ptr<> will handle all of this for you, therefore unless you need to implement it on your own, you will probably do the best to stick to it.

Bear in mind that the locking is very expensive, and it happens every time you hand objects around between smart pointers - even when the object is currently owned by one thread (the smart pointer library doesn't know that).

Given this, there may be a rule of thumb applicable here (I'm happy to be corrected!)

If the follow things apply to you:

  • You have complex data structures that would be difficult to write destructors for (or where STL-style value semantics would be inappropriate, by design) so you need smart pointers to do it for you, and
  • You're using multiple threads that share these objects, and
  • You care about performance as well as correctness

... then actual garbage collection may be a better choice. Although GC has a bad reputation for performance, it's all relative. I believe it compares very favourably with locking smart pointers. It was an important part of why the CLR team chose true GC instead of something using reference counting. See this article, in particular this stark comparison of what reference assignment means if you have counting going on:

no ref-counting:

a = b;

ref counting:

if (a != null)
    if (InterlockedDecrement(ref a.m_ref) == 0)
            a.FinalRelease();

if (b != null)
    InterlockedIncrement(ref b.m_ref);

a = b;

If the instruction itself is not atomic then you need to make the section of code that updates the appropriate variable a critical section.

i.e. You need to prevent other threads entering that section of code by using some locking scheme. Of course the locks need to be atomic, but you can find an atomic locking mechanism within the pthread_mutex class.

The question of efficient: The pthread library is as efficient as it can be and still guarantee that mutex lock is atomic for your OS.

Is it expensive: Probably. But for everything that requires a guarantee there is a cost.

That particular code posted in that ddj article is adding extra complexity to account for bugs in using smart pointers.

Specifically, if you can't guarantee that the smart pointer won't change in an assignment to another smart pointer, you are doing it wrong or are doing something very unreliable to begin with. If the smart pointer can change while being assigned to another smart pointer, that means that the code doing the assignment doesn't own the smart pointer, which is suspect to begin with.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top