Question

I've been looking at the problem of writing a concurrent Multimap, and I have an implementation backed by the Google Guava AbstractSetMultimap and a MapMaker computing map that creates on demand the values-collections as a set view over a ConcurrentHashMap. With some care over the view collections and various wrappers, I think this gets pretty close.

The big problem, which has already been discussed by others who have tried this, appears to be that of removing the values-collections from the underlying map when they become empty, without introducing race conditions.

A couple of options appear to exist.

  • leave the empty collections there. This will leak some CHMs, but I believe it to be at least correct.
  • try optimistically to remove the collection when empty and compensate if anything else appears in it. This is full of races and seems intrinsically impossible to fix.
  • synchronise everything on the values-collection, which would at least allow this removal, but at the cost of any concurrency after the initial lookup by key.
  • for a smaller penalty (perhaps, depending on usage patterns?), perhaps synchronise on values-collection creation and removal, need to check whether that covers everything.

Questions:

  • Does anyone know of any better implementation than this? Can we do better composing bits of MapMaker, or does this need a specialised ConcurrentHashMultimap written from scratch?
  • If it's difficult to improve much on this, is this leak likely to be much of an issue in practice? Notable collections such as java.util.HashMap, juc.ConcurrentHashMap and ArrayDeque do not resize the backing store downwards, and ArrayList does not do so automatically. As long as we clear out the objects, I wonder whether this will matter too much.

Thanks


Edit: see also the discussion here on the guava mailing list.


Edit 2: I've since written this up. Please see this Google code area for an implementation. I would greatly appreciate any feedback from anyone who tries it, there rather than here.

Was it helpful?

Solution

As a follow-up, here are some details I omitted from the earlier discussions, which you linked to, about my concurrent multimap implementation.

That implementation followed your first suggestion: leaving empty collections in the backing map. The following live-view behavior complicates your other suggestions.

Multimap<String, Integer> multimap = HashMultimap.create();
Set<Integer> set = multimap.get("foo");
multimap.put("foo", 1);
int count = set.size();   // equals 1

For a real-world application, as opposed to a collection library class, something less than a fully concurrent multimap would probably suffice. You could define your own class that implements a subset of the Multimap interface or a limited selection of concurrency guarantees. Or, your application logic could minimize contention of a synchronized multimap to avoid performance problems.

OTHER TIPS

I asked the same question earlier, and ended up implementing 4 different implementations.

The question: High-performance Concurrent MultiMap Java/Scala

The impl (I call it Index) http://github.com/jboner/akka/blob/master/akka-actor/src/main/scala/actor/ActorRegistry.scala#L314

If you really don't want to leak the empty collection you could try to atomically replace it with a per-key placeholder Future. That way concurrent add/remove or add/add should be able to reach consistent state when expanding again.

Using immutable collections as the values is the best way to solve/simplify the basic concurrency issues as you can then remove with the atomic replace method. Unfortunately, there aren't immutable collections with fast copy/update profiles in general use, so you usually need to do quite expensive copy everything.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top