سؤال

I have a piece of code that can be executed by multiple threads that needs to perform an I/O-bound operation in order to initialize a shared resource that is stored in a ConcurrentMap. I need to make this code thread safe and avoid unnecessary calls to initialize the shared resource. Here's the buggy code:

    private ConcurrentMap<String, Resource> map;

    // .....

    String key = "somekey";
    Resource resource;
    if (map.containsKey(key)) {
        resource = map.get(key);
    } else {
        resource = getResource(key); // I/O-bound, expensive operation
        map.put(key, resource);
    }

With the above code, multiple threads may check the ConcurrentMap and see that the resource isn't there, and all attempt to call getResource() which is expensive. In order to ensure only a single initialization of the shared resource and to make the code efficient once the resource has been initialized, I want to do something like this:

    String key = "somekey";
    Resource resource;
    if (!map.containsKey(key)) {
        synchronized (map) {
            if (!map.containsKey(key)) {
                resource = getResource(key);
                map.put(key, resource);
            }
        }
    }

Is this a safe version of double checked locking? It seems to me that since the checks are called on ConcurrentMap, it behaves like a shared resource that is declared to be volatile and thus prevents any of the "partial initialization" problems that may happen.

هل كانت مفيدة؟

المحلول

yes it' safe.

If map.containsKey(key) is true, according to doc, map.put(key, resource) happens before it. Therefore getResource(key) happens before resource = map.get(key), everything is safe and sound.

نصائح أخرى

If you can use external libraries, take a look at Guava's MapMaker.makeComputingMap(). It's tailor-made for what you're trying to do.

Why not use the putIfAbsent() method on ConcurrentMap?

if(!map.containsKey(key)){
  map.putIfAbsent(key, getResource(key));
}

Conceivably you might call getResource() more than once, but it won't happen a bunch of times. Simpler code is less likely to bite you.

In general, double-checked locking is safe if the variable you're synchronizing on is marked volatile. But you're better off synchronizing the entire function:


public synchronized Resource getResource(String key) {
  Resource resource = map.get(key);
  if (resource == null) {
    resource = expensiveGetResourceOperation(key);    
    map.put(key, resource);
  }
  return resource;
}

The performance hit will be tiny, and you'll be certain that there will be no sync problems.

Edit:

This is actually faster than the alternatives, because you won't have to do two calls to the map in most cases. The only extra operation is the null check, and the cost of that is close to zero.

Second edit:

Also, you don't have to use ConcurrentMap. A regular HashMap will do it. Faster still.

No need for that - ConcurrentMap supports this as with its special atomic putIfAbsent method.

Don't reinvent the wheel: Always use the API where possible.

The verdict is in. I timed 3 different solutions in nanosecond accuracy, since after all the initial question was about performance:

Fully synching the function on a regular HashMap:

synchronized (map) {

   Object result = map.get(key);
   if (result == null) {
      result = new Object();
      map.put(key, result);
   }                
   return result;
}

first invocation: 15,000 nanoseconds, subsequent invocations: 700 nanoseconds

Using the double check lock with a ConcurrentHashMap:

if (!map.containsKey(key)) {
   synchronized (map) {
      if (!map.containsKey(key)) {
         map.put(key, new Object());
      }
   }
} 
return map.get(key);

first invocation: 15,000 nanoseconds, subsequent invocations: 1500 nanoseconds

A different flavor of double checked ConcurrentHashMap:

Object result = map.get(key);
if (result == null) {
   synchronized (map) {
      if (!map.containsKey(key)) {
         result = new Object();
         map.put(key, result);
      } else {
         result = map.get(key);
      }
   }
} 

return result;

first invocation: 15,000 nanoseconds, subsequent invocations: 1000 nanoseconds

You can see that the biggest cost was on the first invocation, but was similar for all 3. Subsequent invocations were the fastest on the regular HashMap with method sync like user237815 suggested but only by 300 NANO seocnds. And after all we are talking about NANO seconds here which means a BILLIONTH of a second.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top