AVL tree vs. B-tree

https://stackoverflow.com/questions/2734692

02-10-2019
|

Question

How is an AVL tree different from a B-tree?

Solution

AVL trees are intended for in-memory use, where random access is relatively cheap. B-trees are better suited for disk-backed storage, because they group a larger number of keys into each node to minimize the number of seeks required by a read or write operation. (This is why B-trees are often used in file systems and databases, such as SQLite.)

OTHER TIPS

Both the AVL tree and the B-tree are similar in that they are data structures that, through their requirements, cause the height of their respective trees to be minimized. This "shortness" allows searching to be performed in O(log n) time, because the largest possible number of reads corresponds to the height of the tree.

This is an AVL tree, and is a binary search tree at its core. However, it is self-balancing, which means that as you add elements to the tree, it will restructure itself to maintain as uniform of a height as it can. Basically, it will not allow long branches.

A B-tree also does this, but through a different re-balancing scheme. It's a little too complicated to write out, but if you Google search "B-tree animation" there are some really good applets out there that explain a B-tree pretty well.

They are different in that an AVL tree is implemented with memory-based solutions in mind, while a B-tree is implemented with disk-based solutions in mind. AVL trees are not designed to hold massive collections of data, as they use dynamic memory allocation and pointers to the next block of memory. Obviously, we could replicate the AVL tree's functionality with disk locations and disk pointers, but it would be much slower because we would still have a significant number of reads to read a tree of a very large size.

When the data collection is so large that it doesn't fit in memory, the solution is a B-tree (interesting factoid: there is no consensus on what the "B" actually stands for). A B-tree holds many children at one node and many pointers to children node. This way, during a disk read (which can take around 10 ms to read a single disk block), the maximum amount of relevant node data is returned, as well as pointers to "leaf node" disk blocks. This allows retrieval time of data to be amortized to log(n) time, making the B-tree especially useful for database and large dataset retrieval implementations.

An AVL tree is a self-balancing binary search tree, balanced to maintain O(log n) height.

A B-tree is a balanced tree, but it is not a binary tree. Nodes have more children, which increases per-node search time but decreases the number of nodes the search needs to visit. This makes them good for disk-based trees. For more details, see the Wikipedia article.

AVL is self balancing, guaranteeing that all operations are O(log n) in average and worst cases.

The B-tree uses all of the ideas described above. In particular, a B-tree:

1)keeps keys in sorted order for sequential traversing
2)uses a hierarchical index to minimize the number of disk reads
3)uses partially full blocks to speed insertions and deletions
4)keeps the index balanced with an elegant recursive algorithm

In addition, a B-tree minimizes waste by making sure the interior nodes are at least half full. A B-tree can handle an arbitrary number of insertions and deletions.

An AVL tree is a self balancing binary tree which enable O(lgN) average and worst case for search insert and delete operations. It is used for in memory backed search trees (moderate sized datasets).

A B-Tree is primarily used as a storage backed search tree for very large datasets because it requires less reads to disk (since each node contains N keys where N >1). A B-Tree is said to be a (N,N+1) B-Tree where N is the number of keys per node and N+1 is the number of children per node. The more keys per node the less times you will need to read from disk and it will also naturally be a shallower tree (less levels).

They are very different indeed, though they serve largely the same purpose: supporting an associative table. Historically, the AVL tree were taken to outperform the B-tree for in-memory operations, but that hold especially true when accessing memory was cheap(er) when compared to CPU cycles.

Though generally used in databases store for variable length keys, the B-trees perform best for fixed length and short records (key + data). For such uses, that may significantly outperform the AVL trees for in-memory uses, both in terms of memory footprint (as they store data more compactly) and speed (they'd have lot better cache locality).

L2 is a data structures library implementing very fast associative tables and sequences over B-trees. It also has AVL trees, and making a comparison between the performance of two is easy.

Other answerers have already provided quite in-depth technical details about both AVL and B-Tree but I would like to add a relatively novice information regarding these two :) -

AVL tree is a binary tree while B-tree is a multi-way tree (N-ary tree) i.e. Any node in AVL tree can have at max two child nodes and one piece of information/data while any node in a B-tree can have n nodes and n-1 piece of information/data. For B-tree, n is also known as its order.

In layman terms -

AVL tree and Binary search tree are both same but AVL tree has a constraint that the difference between the height of left sub tree and right sub tree should either be 0, 1 or -1.

If any binary search tree meet these conditions it will be called as AVL tree.

Binary search tree + HEIGHT CONDITION is an AVL tree.

Refer: Introduction to Algorithms by Cormen https://books.google.co.in/books...

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow