Question

This question led me to wonder about thread-local storage in high-level development frameworks like Java and .NET.

Java has a ThreadLocal<T> class (and perhaps other constructs), while .NET has data slots, and soon a ThreadLocal<T> class of its own. (It also has the ThreadStaticAttribute, but I'm particularly interested in thread-local storage for member data.) Most other modern development environments provide one or more mechanisms for it, either at the language or framework level.

What problems does thread-local storage solve, or what advantages does thread-local storage provide over the standard object-oriented idiom of creating separate object instances to contain thread-local data? In other words, how is this:

// Thread local storage approach - start 200 threads using the same object
// Each thread creates a copy of any thread-local data
ThreadLocalInstance instance = new ThreadLocalInstance();
for(int i=0; i < 200; i++) {
    ThreadStart threadStart = new ThreadStart(instance.DoSomething);
    new Thread(threadStart).Start();
}

Superior to this?

// Normal oo approach, create 200 objects, start a new thread on each
for(int i=0; i < 200; i++) {
    StandardInstance standardInstance = new StandardInstance();
    ThreadStart threadStart = new ThreadStart(standardInstance.DoSomething);      
    new Thread(threadStart).Start();
}

I can see that using a single object with thread-local storage could be slightly more memory-efficient and require fewer processor resources due to fewer allocations (and constructions). Are there other advantages?

Was it helpful?

Solution

What problems does thread-local storage solve, or what advantages does thread-local storage provide over the standard object-oriented idiom of creating separate object instances to contain thread-local data?

Thread local storage allows you to provide each running thread with a unique instance of a class, which is very valuable when trying to work with non-threadsafe classes, or when trying to avoid synchronization requirements that can occur due to shared state.

As for the advantage vs. your example - if you are spawning a single thread, there is little or no advantage to using thread local storage over passing in an instance. ThreadLocal<T> and similar constructs become incredibly valuable, however, when working (directly or indirectly) with a ThreadPool.

For example, I have a specific process I worked on recently, where we are doing some very heavy computation using the new Task Parallel Library in .NET. Certain portions of the computations performed can be cached, and if the cache contains a specific match, we can shave off quite a bit of time when processing one element. However, the cached info had a high memory requirement, so we didn't want to cache more than the last processing step.

However, trying to share this cache across threads is problematic. In order to do so, we'd have to synchronize the access to it, and also add some extra checks inside of our class to make them thread safe.

Instead of doing this, I rewrote the algorithm to allow each thread to maintain its own private cache in a ThreadLocal<T>. This allows the threads to each maintain their own, private cache. Since the partitioning scheme the TPL uses tends to keep blocks of elements together, each thread's local cache tended to contain the appropriate values it required.

This eliminated the synchronization issues, but also allowed us to keep our caching in place. The overall benefit was quite large, in this situation.

For a more concrete example, take a look at this blog post I wrote on aggregation using the TPL. Internally, the Parallel class uses a ThreadLocal<TLocal> whenever you use the ForEach overload that keeps local state (and the Parallel.For<TLocal> methods, too). This is how the local state is kept separate per thread to avoid locking.

OTHER TIPS

Just occasionally, it's helpful to have thread-local state. One example is for a log context - it can be useful to set the context of which request you're currently servicing, or something similar, so that you can collate all the logs to do with that request.

Another good example is System.Random in .NET. It's fairly common knowledge that you shouldn't create a new instance every time you want to use Random, so some people create a single instance and put it in a static variable... but that's awkward because Random isn't thread-safe. Instead, you really want one instance per thread, seeded appropriately. ThreadLocal<T> works great for this.

Similar examples are the culture associated with a thread, or the security context.

In general, it's a case of not wanting to pass too much context round all over the place. You could make every single method call include a "RandomContext" or a "LogContext" - but it would get in the way of your API's cleanliness - and the chain would be broken if you ever had to call into another API which would call back to yours through a virtual method or something similar.

In my view, thread-local data is something that should be avoided where possible - but just occasionally it can be really useful.

I would say that in most cases you can get away with it being static - but just occasionally you might want per-instance, per-thread information. Again, it's worth using your judgement to see where it's useful.

It helps passing a value down the stack. It comes handy when you need a value down the call stack but there is no way (or benefit) to pass this value to the place it is needed as a parameter to a method. The above example of storing the current HttpRequest in a ThreaLocal is a good example of this: the alternative would be to pass the HttpRequest as parameter down the stack to where it would be needed.

In Java, Thread local storage can be useful in a web application where a single request is typically processed by a given Thread. Take Spring Security for instance, the security Filter will perform the authentication and then store the users credentials in a Thread local variable.

This allows the actual request processing code to have access to the current users request/authentication information without having to inject anything else in to the code.

You want to make a series of calls, accessing some variable ubiquitously. You may pass it as argument in every call

function startComputingA(other args) {
  global_v = create // declared locally
  call A2(other args, global_v)
  call A3(other args, global_v)

function A2(other args, global_v) {
  call A3(other args, global_v)

function A3(other args, global_v) {
  call A4(other args, global_v)

All your functions must declare global_v argument. This sucks. You have a global scope for storing global variables and route it "virtually" to every routine

variable global_v;
function A() { // use global_v and call B() }
function B() { // use global_v and call C() }

Yet, it may happen that another thread starts executing some of these functions meantime. This will corrupt your global variable. So, you want the variable to be visible globally for all routines, yet, not between threads. You want every thread to have a separate copy of global_v. Here is when the local storage is indispensable! You declare global_v as a thread-local variable. So, any threads can access global_v from anywhere, but different copies of it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top