Question

When i look up Tries and radix trees, such as http://en.wikipedia.org/wiki/Compact_prefix_tree and http://en.wikipedia.org/wiki/Trie, I see no specific thing on lexicographic-ordering of the children of a node.

so, in this trie for instance (the only figure right up on the page) the children of the root can better be ordered as 'A', 'i', 't' from left to right.

Tries/radix trees are for retrieval-- not for frequent update. so, this kind of ordering doesn't cost much especially on rare tree updates, algorithmically easy/straightforward and adds up some to the speed during value look-up/retrieval.

what am i missing?

i'm looking for arguments on/against this.

Was it helpful?

Solution

I assume you want to order the children so that you can search them more quickly. I think you'll find, though, that the number of children for a given node is pretty small--small enough that the difference between binary search and sequential search doesn't really matter. Or perhaps even so small that sequential search is faster than binary search.

For example, it makes no sense at all to lexicographically order the children of the letter 'q' because it has so few children. Binary search on the few letters that follow 'q' would be slower than sequential search. It makes much more sense to order the children by frequency. 'u' would be the first child, and the item that's selected far more frequently than any others.

I don't have a table of bigram frequencies in front of me, but I suspect you'll find that in most cases the number of likely children for a particular letter does not justify lexicographical ordering, and that ordering by frequency leads to much better performance. The possible exception is at the start of words, but even then I suspect that it would make much more sense to order by frequency.

You could build such a trie and examine the nodes. See how many children the typical node has, and see what the frequencies are.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top