Question

In a common implementation of a B+ tree, we can assume that keys have a fixed length (e.g. 25 bytes). We can then define that each node must have a minimum number of keys, and a maximum.

If I wanted the tree to accept variable length keys, what should I modify? What if I say that the node must have at least 2 keys, but the key which i'm trying to insert is so big that it doesn't fit into the Block that holds the node?

Was it helpful?

Solution

The simple solution is to store the keys as pointers (wrapped in a type that overrides the relative operators etc) rather than values, but that of course damages the locality that is part of the point of using B+ trees.

That said, the larger the items, the less it matters that items are adjacent in memory. Huge items won't fit even one to a cache page, let alone several in the same page.

Another relatively simple approach is to use a union type or placement new or whatever to allocate items within a memory-for-item type that's big enough for all item types you might use. You still have a fixed number of bytes per item, but the items don't necessarily use all those bytes.

If you're willing to do the work, you could have variable-sized nodes. You'll have some hassles working with those nodes, of course, depending on how you arrange the in-node data structure to cope with that. You might have a small array of item-pointers within the node, for instance, pointing to the items which are also inside the node (not separately allocated on the heap).

Also, every time you change a node you may need to reallocate it. Even if all you're doing is rebalancing, that might move a huge item from one node into another, and even though the destination node has room in the sense of having a slot free for an item it may not have enough bytes to store the value.

In a sense, each node would be a mini-heap in which you can allocate or release space for items big or small, but sometimes you'd have to go back to the heap proper to replace that mini-heap with a bigger or smaller one.

It's again worth mentioning that if the items are that huge, locality within a node probably isn't relevant anyway.

I've implemented B+-style multiway trees in memory before myself, but I've never gone to this extreme.

OTHER TIPS

You can keep the rest of a big key in an overflow page, like there.

Use hashing. A hash is a fixed-size representation of a key. For good hashing functions see http://www.cse.yorku.ca/~oz/hash.html.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top