Question

A query on a non-indexed column will result in O(n) because it has to search the entire table. Adding an index allows for O(log n) due to binary search. Is there a way for databases to achieve O(1) using the same technique a hash table uses (or perhaps another way) if you are searching on a unique key?

Was it helpful?

Solution

Hash-based indices are supported by some rdbms under some conditions. For example, MySQL supports the syntax CREATE INDEX indexname USING HASH ON tablename (cols…) but only if the named table is stored in memory, not if it is stored on disk. Clustered tables are a special case.

I guess the main reason against widespread use of hash indices in rdbms is the fact that they scale poorly. Since disk I/O is expensive, a very thin index will require lots of I/O for little gain in information. Therefore you would prefer a rather densely populated index (e.g. keep the filled portion between ⅓ and ⅔ at all times), which can be problematic in terms of hash collisions. But even more problematic: as you insert values, such a dense index might become too full, and you'd have to increase the size of the hash table fairly often. Doing so will mean completely rebuilding the hash table. That's an expensive operation, which requires lots of disk I/O, and will likely block all concurrent queries on that table. Not something to look forward to.

A B-tree on the other hand can be extended without too much overhead. Even inceasing its depth, which is the closest analogon to an extension of the hash table size, can be done more cheaply, and will be required less often. Since B-trees tend to be shallow, and since disk I/O tends to outweight anything you do in memory, they are still the preferred solution for most practical applications. Not to mention the fact that they provided cheap access to ranges of values, which isn't possible with a hash.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top