Java Multimap<String,String> with Trove

Question 1

Trove4j doesn't contain hashmap for string-to-string.

See http://trove4j.sourceforge.net/javadocs/gnu/trove/map/hash/package-summary.html

Question 2

Guava's Multimaps are backed by standard JDK Collections which aren't optimized for memory usage. For example, ArrayListMultimap<K, V> is backed by HashMap<K, ArrayList<V>> and HashMultimap<K, V> is backed by HashMap<K, HashSet<V>>.

Eclipse Collections (formerly GS Collections) has Multimaps backed by its own container types, UnifiedMap and UnifiedSet. UnifiedMap uses half the memory of HashMap and UnifiedSet uses a quarter the memory of HashSet. The memory benefits you'll see will depend on whether you use a FastListMultimap or a UnifiedSetMultimap.

More detailed memory comparisons are available here.

Note: I am a committer for Eclipse Collections.

Question 3

You could look at memory efficient variant of hash maps, such as this one: https://code.google.com/p/sparsehash/

If your value strings are long enough, compression could be an option. You could also look into disk backed solutions such as Ehcache, depending on your access statistics.

Question 4

An approach I use is to use Map<String,Collection<String>> where the values start out as ArrayList<String> and get promoted to HashSet<String> when the bucket hits some threshold, say 32 elements.

I have found this saves a lot of memory for small buckets.