A red black tree with the same key multiple times: store collections in the nodes or store them as multiple nodes?

https://stackoverflow.com/questions/12635502

04-07-2021
|

Question

Apparently you could do either, but the former is more common.

Why would you choose the latter and how does it work?

I read this: http://www.drdobbs.com/cpp/stls-red-black-trees/184410531; which made me think that they did it. It says:

insert_always is a status variable that tells rb_tree whether multiple instances of the same key value are allowed. This variable is set by the constructor and is used by the STL to distinguish between set and multiset and between map and multimap. set and map can only have one occurrence of a particular key, whereas multiset and multimap can have multiple occurrences.

Although now i think it doesnt necessarily mean that. They might still be using containers.

I'm thinking all the nodes with the same key would have to be in a row, because you either have to store all nodes with the same key on the right side or the left side. So if you store equal nodes to the right and insert 1000 1s and one 2, you'd basically have a linked list, which would ruin the properties of the red black tree.

Is the reason why i can't find much on it that it's just a bad idea?

Solution

down side of store as multiple nodes:

expands tree size, which make search slower.
if you want to retrieve all values for key K, you need M*log(N) time, where N is number of total nodes, M is number of values for key K, unless you introduce extra code (which complicates the data structure) to implement linked list for these values. (if storing collection, time complexity only take log(N), and it's simple to implement)
more costly to delete. with multi-node method, you'll need to remove node on every delete, but with collection-storage, you only need to remove node K when the last value of key K is deleted.

Can't think of any good side of multi-node method.

OTHER TIPS

Binary Search trees by definition cannot contain duplicates. If you use them to produce a sorted list throwing out the duplicates would produce an incorrect result. I am working on an implementation of Red Black trees in PHP when I ran into the duplicate issue. We are going to use the tree for sorting and searching. I am considering adding an occurrence value to the node data type. When a duplicate is encountered just increment occurrence. When walking the tree to produce output just repeat the value by the number of occurrences. I think I would still have a valid BST and avoid having a whole chain of duplicate values which preserve the optimal search time.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow