Pergunta

I am having some confusion on the runtimes of the find_min operation on a Binary Search Tree and a Binary Heap. I understand that returning the min in a Binary Heap is a O(1) operation. I also understand why in theory, returning the minimum element in a Binary Search Tree is O(log(N)) operation. Much to my surprise, when I read up on the data structure in C++ STL, the documentation states that returning an iterator to the first element in a map (which is the same as returning the minimum element) occurs in constant time! Shouldnt this be getting returned in logarithmic time? I need someone to help me understand what C++ is doing under the hood to return this in constant time. Because then, there is no point in really using a binary heap in C++, the map data structure would then support retrieve min and max in both constant time, delete and search in O(log(N)) and keeps everything sorted. This means that the data structure has the benefits of both a BST and Binary Heap all tied up in one!

I had an argument about this with an interviewer (not really an argument) but I was trying to explain to him that in C++ returning min and max from map in C++ (which is a self-balancing binary search tree) occurs in constant time. He was baffled and kept saying I was wrong and that a binary heap was the way to go. Clarification would be much appreciated

Foi útil?

Solução

The constant-time lookup of the minimum and maximum is achieved by storing references to the leftmost and the rightmost nodes of the RB-tree in the header structure of the map. Here is a comment from the source code of the RB-tree, a template from which the implementation of std::set, std::map, and std::multimap are derived:

the header cell is maintained with links not only to the root but also to the leftmost node of the tree, to enable constant time begin(), and to the rightmost node of the tree, to enable linear time performance when used with the generic set algorithms (set_union, etc.)

The tradeoff here is that these pointers need to be maintained, so insertion and deletion operations would have another "housekeeping operation" to do. However, insertions and deletions are already done in logarithmic time, so there is no additional asymptotic cost for maintaining these pointers.

Outras dicas

At least in a typical implementation, std::set (and std::map) will be implemented as a threaded binary tree1. In other words, each node contains not only a pointer to its (up to) two children, but also to the previous and next node in order. The set class itself then has pointers to not only the root of the tree, but also to the beginning and end of the threaded list of nodes.

To search for a node by key, the normal binary pointers are used. To traverse the tree in order, the threading pointers are used.

This does have a number of disadvantages compared to a binary heap. The most obvious is that it stores four pointers for each data item, where a binary heap can store just data, with no pointers (the relationships between nodes are implicit in the positions of the data). In an extreme case (e.g., std::set<char>) this could end up using a lot more storage for the pointers than for the data you actually care about (e.g., on a 64-bit system you could end up with 4 pointers of 64-bits apiece, to store each 8-bit char). This can lead to poor cache utilization, which (in turn) tends to hurt speed.

In addition, each node will typically be allocated individually, which can reduce locality of reference quite badly, again hurting cache usage and further reducing speed.

As such, even though the threaded tree can find the minimum or maximum, or traverse to the next or previous node in O(1), and search for any given item in O(log N), the constants can be substantially higher than doing the same with a heap. Depending on the size of items being stored, total storage used may be substantially greater than with a heap as well (worst case is obviously when only a little data is stored in each node).


1. With some balancing algorithm applied--most often red-black, but sometimes AVL trees or B-trees. Any number of other balanced trees could be using (e.g., alpha-balanced trees, k-neighbor trees, binary b-trees, general balanced trees).

I'm not an expert at maps, but returning the first element of a map would be considered a 'root' of sorts. there is always a pointer to it, so the look up time of it would be instant. The same would go for a BSTree as it clearly has a root node then 2 nodes off of it and so on (which btw I would look into using an AVL Tree as the look up time for the worst case scenario is much better than the BSTree).

The O(log(N)) is normally only used to get an estimate of the worst case scenario. So if you have a completely unbalanced BSTree you'll actually have O(N), so if your searching for the last node you have to do a comparison to every node.

I'm not too sure about your last statement though a map is different from a self-balancing tree, those are called AVL Trees (or thats what I was taught). A map uses 'keys' to organize objects in a specific way. The key is found by serializing the data into a number and the number is for the most part placed in a list.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top