Binary Heap vs Binary Tree C++

Question 1

The constant-time lookup of the minimum and maximum is achieved by storing references to the leftmost and the rightmost nodes of the RB-tree in the header structure of the map. Here is a comment from the source code of the RB-tree, a template from which the implementation of std::set, std::map, and std::multimap are derived:

the header cell is maintained with links not only to the root but also to the leftmost node of the tree, to enable constant time begin(), and to the rightmost node of the tree, to enable linear time performance when used with the generic set algorithms (set_union, etc.)

The tradeoff here is that these pointers need to be maintained, so insertion and deletion operations would have another "housekeeping operation" to do. However, insertions and deletions are already done in logarithmic time, so there is no additional asymptotic cost for maintaining these pointers.

Question 2

At least in a typical implementation, std::set (and std::map) will be implemented as a threaded binary tree¹. In other words, each node contains not only a pointer to its (up to) two children, but also to the previous and next node in order. The set class itself then has pointers to not only the root of the tree, but also to the beginning and end of the threaded list of nodes.

To search for a node by key, the normal binary pointers are used. To traverse the tree in order, the threading pointers are used.

This does have a number of disadvantages compared to a binary heap. The most obvious is that it stores four pointers for each data item, where a binary heap can store just data, with no pointers (the relationships between nodes are implicit in the positions of the data). In an extreme case (e.g., std::set<char>) this could end up using a lot more storage for the pointers than for the data you actually care about (e.g., on a 64-bit system you could end up with 4 pointers of 64-bits apiece, to store each 8-bit char). This can lead to poor cache utilization, which (in turn) tends to hurt speed.

In addition, each node will typically be allocated individually, which can reduce locality of reference quite badly, again hurting cache usage and further reducing speed.

As such, even though the threaded tree can find the minimum or maximum, or traverse to the next or previous node in O(1), and search for any given item in O(log N), the constants can be substantially higher than doing the same with a heap. Depending on the size of items being stored, total storage used may be substantially greater than with a heap as well (worst case is obviously when only a little data is stored in each node).

_{1. With some balancing algorithm applied--most often red-black, but sometimes AVL trees or B-trees. Any number of other balanced trees could be using (e.g., alpha-balanced trees, k-neighbor trees, binary b-trees, general balanced trees).}

Question 3

I'm not an expert at maps, but returning the first element of a map would be considered a 'root' of sorts. there is always a pointer to it, so the look up time of it would be instant. The same would go for a BSTree as it clearly has a root node then 2 nodes off of it and so on (which btw I would look into using an AVL Tree as the look up time for the worst case scenario is much better than the BSTree).

The O(log(N)) is normally only used to get an estimate of the worst case scenario. So if you have a completely unbalanced BSTree you'll actually have O(N), so if your searching for the last node you have to do a comparison to every node.

I'm not too sure about your last statement though a map is different from a self-balancing tree, those are called AVL Trees (or thats what I was taught). A map uses 'keys' to organize objects in a specific way. The key is found by serializing the data into a number and the number is for the most part placed in a list.