Automatically Sharding a Java Map across multiple nodes
Question
I have a problem where I need to assemble a Map whose eventual size is in the GBs (going on past 64GB) and I cannot assume that a user of the program will have this kind of monster machine hanging around. A nice solution would be to distribute this map across a number of machines to make a far more modest memory footprint per instance.
Does anyone know of a library/suite of tools which can perform this sharding? I do not care about replication or transactions; just spreading this memory requirement around.
Solution
I suggest that you start with hazelcast:
It is open-source, and in my opinion it is very easy to work with, so it is the best framework for rapid prototyping.
As far as I as know, it performs faster than the commercial alternatives, so I wouldn't worry about performance either.
(I haven't formally benchmarked it myself)
OTHER TIPS
terracotta might be useful have a look here
its a clustered jvm will depend on how often you update the map i guess on how well it performs.
Must it be open source? If not, Oracle Coherence can do it.
You may be able to solve your problem by using a database instead, something like http://hsqldb.org/ may provide the functionality you need with the ability to write the data to disk rather than keeping the whole thing in memory.
I would definitely take a step back and ask yourself if a map is the right data structure for GBs of data.
Gigaspaces Datagrid sounds like the kind of thing you are looking for. (Not free though)