Question

I'm wondering what is the "best" way to make data thread-safe.

Specifically, I need to protect a linked-list across multiple threads -- one thread might try to read from it while another thread adds/removes data from it, or even frees the entire list. I've been reading about locks; they seem to be the most commonly used approach, but apparently they can be problematic (deadlocks). I've also read about atomic-operations as well as thread-local storage.

In your opinion, what would be my best course of action? What's the approach that most programmers use, and for what reason?

Was it helpful?

Solution

One approach that is not heavily used, but quite sound, is to designate one special purpose thread to own every "shared" structure. That thread generally sits waiting on a (thread-safe;-) queue, e.g. in Python a Queue.Queue instance, for work requests (reading or changing the shared structure), including both ones that request a response (they'll pass their own queue on which the response is placed when ready) and ones that don't. This approach entirely serializes all access to the shared resource, remaps easily to a multi-process or distributed architecture (almost brainlessly, in Python, with multiprocessing;-), and absolutely guarantees soundness and lack of deadlocks as well as race conditions as long as the underlying queue object is well-programmed once and for all.

It basically turns the hell of shared data structures into the paradise of message-passing concurrency architectures.

OTOH, it may be a tad higher-overhead than slugging it out the hard way with locks &c;-).

OTHER TIPS

You could consider an immutable collection. Much like how a string in .net has methods such as Replace, Insert, etc. It doesn't modify the string but instead creates a new one, a LinkedList collection can be designed to be immutable as well. In fact, a LinkedList is actually fairly simple to implement this way as compared to some other collection data structures.

Here's a link to a blog post discussing immutable collections and a link to some implementations in .NET.

http://blogs.msdn.com/jaredpar/archive/2009/04/06/immutable-vs-mutable-collection-performance.aspx

Always remember the most important rule of thread safety. Know all the critical sections of your code inside out. And by that, know them like your ABCs. Only if you can identify them at go once asked will you know which areas to operate your thread safety mechanisms on.

After that, remember the rules of thumb:

  • Look out for all your global variables / variables on the heap.
  • Make sure your subroutines are re-entrant.
  • Make sure access to shared data is serialized.
  • Make sure there are no indirect accesses through pointers.

(I'm sure others can add more.)

The "best" way, from a safety point of view, is to put a lock on the entire data structure, so that only one thread can touch it at a time.

Once you decide to lock less than the entire structure, presumably for performance reasons, the details of doing this are messy and differ for every data structure, and even variants of the same structure.

My suggestion is to

  1. Start with a global lock on your data structure. Profile your program to see if it's really a problem.

  2. If it is a problem, consider whether there's some other way to distribute the problem. Can you minimize the amount of data in the data structure in question, so that it need not be accessed so often or for so long? If it's a queuing system, for example, perhaps you can keep a local queue per thread, and only move things into or out of a global queue when a local queue becomes over- or under-loaded.

  3. Look at data structures designed to help reduce contention for the particular type of thing you're doing, and implement them carefully and precisely, erring on the side of safety. For the queuing example, work-stealing queues might be what you need.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top