Question

I'm going to index a BDB with keys that look a lot like directory paths ('/foo/bar', '/foo/baz', etc, with levels of slashes generally < 10).

Does anybody have any experience with using a Btree prefix comparison routine[1] for this? Are the savings worthwhile? Any references to experience papers on this subject?

[1] http://www.stanford.edu/class/cs276a/projects/docs/berkeleydb/ref/am_conf/bt_prefix.html

Was it helpful?

Solution

You may want to post your question to the Berkeley DB forum on OTN here. There is a active community of Support, Engineering and BDB application developers that interact directly in this forum.

What I've heard from customers and our own use of Btree prefixing in the BDB XML product is that it can significantly reduce the size of the internal btree nodes, also improving the efficiency of the cache, reducing I/O and thereby improving the efficiency of individual key lookups. This is also stated in the documentation about the btree prefix function located here. The extent of the performance improvement depends on a) your data, b) your application data access patterns. If the key value is mostly identical, then you will save more space in your btree index. If your data access patterns perform many key lookups and by having a smaller btree you reduce the number of I/Os that you have to perform, the performance will improve commensurately.

Please note that if you provide a btree prefix function you must also provide a compatible btree comparison function.

For BDB XML we saw a 20-30 reduction in btree size.

The lexicographic key comparison/prefix functions with are used by default in Berkeley DB may already be providing the behavior that you want.

Good luck with your research.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top