Question

I need a tool similar to cdb (constant database) that would allow me to store large sets of data (in the range of hundreds of gigabytes) in indexed files. CDB is an ideal candidate but it has a 2 GB file size limit so it's not suitable. The functionality I'm looking for is a persistent key-value store supporting binary keys and values. After creating the database is read only and will never be modified. Can you recommend some tool? And btw, storage overhead should be small because I will be storing billion of records.

BTW I'm looking for a db management library (embeddable), not a standalone server. Something that can be used inside a C program.

Thanks, RG

Was it helpful?

Solution

Another option is mcdb, which is extended from Dan J. Bernstein's cdb.

https://github.com/gstrauss/mcdb/

mcdb supports very large constant databases and is faster than cdb, both for database creation and database access. Still, creating a database of hundreds of gigabytes can take a bit of time. mcdb can create a gigabyte-sized database in a few seconds for cached data or in a minute or so when starting from cold cache.

https://github.com/gstrauss/mcdb/blob/master/t/PERFORMANCE

(Disclosure: I am the author of mcdb)

OTHER TIPS

There's hamsterdb (i'm the author), berkeleydb, tokyo cabinet.

hamsterdb uses a btree and therefore sorts your data. tokyo cabinet is a hash table and therefore not sorted. berkeleydb can do both.

Needless to say what I would recommend ;)

All of them can be linked into a C application. None of them should have a 2GB limit.

bye Christoph

If your value is large and keys are small you can consider redis as well http://redis.io

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top