Question

I'm looking for a persistent hash structure in java, a simple key-value store, where key is a unique string and value is an int. The value of a key is to be incremented each time an existing key is added to the store.

I need this to be quite large - possibly 500m - 1bn keys. I've been evaluating tokyo-cabinet http://fallabs.com/tokyocabinet/javadoc/ but not sure how well it will scale - insert times seem to be getting longer as the hash grows.

Any ideas on what might be appropriate?

Thanks

Edit: In order to reduce disk I/O I'm going to be caching data in an in-memory HashMap, then updating the persistent hash in one go when the cache grows to a certain size.

Edit2: One of the reasons for the persistence is that I have limited RAM, 4GB, so I can't fit a big struture into memory.

Was it helpful?

Solution

I thing Megamap is what you are looking for: http://megamap.sourceforge.net/. Here is a short description of Megamap from its homepage:

MegaMap is a Java implementation of a map (or hashtable) that can store an unbounded amount of data, limited only by the amount of disk space available. Objects stored in the map are persisted to disk. Good performance is achieved by an in-memory cache. The MegaMap can, for all practical reasons, be thought of as a map implementation with unlimited storage space.

OTHER TIPS

Use a database not a hash. Even for a database 500M rows is getting quite large. How many updates are you expecting per second?

Have you checked out Berkeley BD Java Edition? They have a Collections-compatible API (see also the Javadoc for StoredMap).

So, if I understand correctly, Redis might be an option. You can issue INCR [key] commands to atomically increment the value associated with that key. If the key does not exist, its set to zero and then incremented (resulting in one). According to the docs, INCR is a constant-time operation. Speed is a primary design goal for Redis.

Redis is able to persist itself to file, and you can control the parameters on how that happens.

I think Memcached is good option for your case along with a suitable database in the backend.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top