Question

In my Java code, I am using Guava's Multimap (com.google.common.collect.Multimap) by using this:

 Multimap<Integer, Integer> Index = HashMultimap.create()

Here, Multimap key is some portion of a URL and value is another portion of the URL (converted into an integer). Now, I assign my JVM 2560 Mb (2.5 GB) heap space (by using Xmx and Xms). However, it can only store 9 millions of such (key,value) pairs of integers (approx 10 million). Now, issue is, I can provide JVM only limited amount of memory (say 2 GB).

So, can anybody help me,

1) Is there another way or home-baked solution to solve this memory issue? Means, Is Disk/DB Based Multi-Map can be a nice solution ? I read from some web articles that there is some DB/Disk based solution to solve this issue ex. Berkley DB or Ehcache. Can anybody inform me whether (or which one) is faster ?

2) Is those Disk/DB Based Multi-Map has performance issue (I am asking for both storing and searching) ?

3) Any idea or information how to use those in brief.

4) Any other idea will be nice for me.

NB: I want Multimap (key can have multiple values)solutions for the above issue. And I have to consider performance of storing and searching also.

Was it helpful?

Solution

You certainly won't store 100 million pairs of Integer objects in 2.5 GB of memory. If I'm not mistaken, an Integer will use at least 16 bytes of memory in Oracle/Sun JVM (and the alignment is also 16 bytes), which means 3.2 GB of memory for the Integers alone, without any structure.

With this data size you should definitely go with something which is backed by the disk, or use a server with lots of memory and/or optimized data structures (in particular try to avoid primitive type wrappers). I have used H2 for similar tasks and found it quite good (it can use mapped files to access the disk instead of reads), but I don't have any comparison with other similar libraries.

OTHER TIPS

JDBM3 is a very fast on-disk HashMap/TreeMap (B+Tree) library and is claimed to be 4x faster than berkeley db. Billions of records can be stored in the map. It does caching internally so map operations won't be slowing down because of disk access.

DB db = DBMaker.openFile(fileName).make();
Map<Integer,Integer> map = db.createHashMap("mapName");
map.put(5, 10);
db.close()

It does not have a Multimap but the value can be a Set/List.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top