Question

I am trying to create a key/value database with 300,000,000 key/value pairs of 8 bytes each (both for the key and the value). The requirement is to have a very fast key/value mechanism which can query about 500,000 entries per second.

I tried BDB, Tokyo DB, Kyoto DB, and levelDB and they all perform very bad when it comes to databases at that size. (Their performance is not even close to their benchmarked rate at 1,000,000 entries).

I cannot store my database in memory because of hardware limitations (32 bit software), so memcached is out of the question.

I cannot use external server software as well (only a database module), and there is no need for multi-user support at all. Of course server software cannot hold 500,000 queries per second from a single endpoint anyways, so that leaves out Redis, Tokyo tyrant, etc.

Was it helpful?

Solution

David Segleau, here. Product Manager for Berkeley DB.

The most common problem with BDB performance is that people don't configure the cache size, leaving it at the default, which is pretty small. The second most common problem is that people write application behavior emulators that do random look-ups (even though their application is not really completely random) which forces them to read data out of cache. The random I/O then takes them down a path of conclusions about performance that are not based on the simulated application rather than the actual application behavior.

From your description, I'm not sure if your running into these common problems or maybe into something else entirely. In any case, our experience is that Berkeley DB tends to perform and scale very well. We'd be happy to help you identify any bottlenecks and improve your BDB application throughput. The best place to get help in this regard would be on the BDB forums at: http://forums.oracle.com/forums/forum.jspa?forumID=271. When you post to the forum it would be useful to show the critical query segments of your application code and the db_stat output showing the performance of the database environment.

It's likely that you will want to use BDB HA/Replication in order to load balance the queries across multiple servers. 500K queries/second is probably going to require a larger multi-core server or a series of smaller replicated servers. We've frequently seen BDB applications with 100-200K queries/second on commodity hardware, but 500K queries per second on 300M records in a 32-bit application is likely going to require some careful tuning. I'd suggest focusing on optimizing the performance of a the queries on the BDB application running on a single node, and then use HA to distribute that load across multiple systems in order to scale your query/second throughput.

I hope that helps.

Good luck with your application.

Regards,

Dave

OTHER TIPS

I found a good benchmark comparison web page that basically compares 5 renowned databases:

  • LevelDB
  • Kyoto TreeDB
  • SQLite3
  • MDB
  • BerkeleyDB

You should check it out before making your choice: http://symas.com/mdb/microbench/.

P.S - I know you've already tested them, but you should also consider that your configuration for each of these tests was not optimized as the benchmark shows otherwise.

Try ZooLib.

It provides a database with a C++ API, that was originally written for a high-performance multimedia database for educational institutions called Knowledge Forum. It could handle 3,000 simultaneous Mac and Windows clients (also written in ZooLib - it's a cross-platform application framework), all of them streaming audio, video and working with graphically rich documents created by the teachers and students.

It has two low-level APIs for actually writing your bytes to disk. One is very fast but is not fault-tolerant. The other is fault-tolerant but not as fast.

I'm one of ZooLib's developers, but I don't have much experience with ZooLib's database component. There is also no documentation - you'd have to read the source to figure out how it works. That's my own damn fault, as I took on the job of writing ZooLib's manual over ten years ago, but barely started it.

ZooLib's primarily developer Andy Green is a great guy and always happy to answer questions. What I suggest you do is subscribe to ZooLib's developer list at SourceForge then ask on the list how to use the database. Most likely Andy will answer you himself but maybe one of our other developers will.

ZooLib is Open Source under the MIT License, and is really high-quality, mature code. It has been under continuous development since 1990 or so, and was placed in Open Source in 2000.

Don't be concerned that we haven't released a tarball since 2003. We probably should, as this leads lots of potential users to think it's been abandoned, but it is very actively used and maintained. Just get the source from Subversion.

Andy is a self-employed consultant. If you don't have time but you do have a budget, he would do a very good job of writing custom, maintainable top-quality C++ code to suit your needs.

I would too, if it were any part of ZooLib other than the database, which as I said I am unfamiliar with. I've done a lot of my own consulting work with ZooLib's UI framework.

300 M * 8 bytes = 2.4GB. That will probably fit into memory (if the OS does not restrict the address space to 31 bits) Since you'll also need to handle overflow, (either by a rehashing scheme or by chaining) memory gets even tighter, for linear probing you probably need > 400M slots, chaining will increase the sizeof item to 12 bytes (bit fiddling might gain you a few bits). That would increase the total footprint to circa 3.6 GB.

In any case you will need a specially crafted kernel that restricts it's own "reserved" address space to a few hundred MB. Not impossible, but a major operation. Escaping to a disk-based thing would be too slow, in all cases. (PAE could save you, but it is tricky)

IMHO your best choice would be to migrate to a 64 bits platform.

500,000 entries per second without holding the working set in memory? Wow.

In the general case this is not possible using HDDs and even difficult SSDs.

Have you any locality properties that might help to make the task a bit easier? What kind of queries do you have?

We use Redis. Written in C, its only slightly more complicated than memcached by design. Never tried to use that many rows but for us latency is very important and it handles those latencies well and lets us store the data in the disk

Here is a bench mark blog entry, comparing redis and memcached.

Berkely DB could do it for you. I acheived 50000 inserts per second about 8 years ago and a final database of 70 billion records.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top