Question

I heard that tries are less efficient than hash tables for performing lookups when the data strictures are stored on disk rather than main memory. Why would this be the case?

Was it helpful?

Solution

On disk, random access is slow because in order to read bytes at a particular location, the hard drive has to physically spin around to put those bytes under the read head. The cost of a random access on disk can be millions of times slower than a comparable access to RAM.

On top of this, whenever you read data from disk, a block of memory called a page is read from disk, not just the bytes you asked for. This means that if you read some data from disk, accessing the bytes near that byte will likely be very fast because that data will have been read from the same page and loaded into RAM. This means that sequential access in an array on disk will be fast, since after the first (slow) read to get the bytes for the first array element to read, the bytes for the next array elements will probably already be loaded and available.

Think about what this means for tries versus linear probing hash tables. A trie is a tree structure where lookups require following lots of pointers to nodes laid out in no particular order in memory. This means that the cost of a trie lookup will likely be one disk read per character of the string, which is terribly inefficient. On the other hand, if you have a hash table using linear probing, the cost of a lookup will (roughly) be the cost of one disk read, since after finding the initial spot in the table where the value should be the array reads should not require future disk reads.

Note that not all tries and all hash tables have this property. Cache-oblivious tries are tries that are specifically constructed to minimize disk reads and can be very quick in external memory. Many hash tables, such as chained hash tables or double hashing tables, have more scattered lookup patterns and thus incur more disk reads.

Hope this helps!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top