Implementing a TRIE with a nested hash map?

Question 1

I use

Map<Character, TrieMap<K, V>> children = new TreeMap<>();

for my implementation of a TrieMap. It works very nicely.

The benefits of using the normal Node structure is that you can wrap links to the parent map into the structure so you can iterate the map more easily. I did not take that route and build a Stack while iterating because I wanted to ensure I did not bloat the structure with unnecessary content. I then build the stack while iterating.

The primary benefit of a Trie is it's space-saving features when the keys are similar - in my opinion it is foolish to then add unnecessary weight to the structure. Thus my decision to only use the TreeMap. The other alternative would have been an Array or List but to me neither of those are as space efficient as a TreeMap when the data is patterned well for a Trie.

Actually - the code looks more like:

/**
 * Map each character to a sub-trie.
 *
 * Could replace this with a 256 entry array of Tries but this will handle multi-byte character sets and I can discard
 * empty maps.
 *
 * Maintained at null until needed (for better memory footprint).
 *
 */
private Map<Character, TrieMap<K, V>> children = null;

....

/**
 * Create a new set of children.
 *
 * I've always wanted to name a method something like this.
 */
private void makeChildren() {
  if (children == null) {
    // Use a TreeMap to ensure sorted iteration.
    children = new TreeMap<>();
  }
}

so I further reduce the memory footprint by ensuring that a childless node does not hold a wasteful empty Map (although I could just as easily have used Collections.emptyMap()).

Question 2

That's a good question and one I'm currently pondering.

Glenn's answer doesn't take into account the prefix storing nature of a Trie (or prefix tree, to give it another name). If all you want is a dictionary then a Hashtable is a better choice but if you want to do some auto-complete style things then the Trie is ideal. There's also nothing I understand about a Trie that requires it to be sorted.

The 'classic' approach I imagine you refer to is one of using an character-indexed array, O(1) lookup, to reference children of any node. This is fast and space efficient for small alphabets but as you observe quickly becomes space prohibitive for very large character sets (Unicode).

One alternative you mention is to have a have HashMap at each node that maps each character to a child node. You preserve the constant lookup time of the indexed array (assuming a true hash implementation) and you hopefully don't use thousands of bytes per node storing empty character slots.

Looks like a win all round to me so I'm also wondering why I don't see it mentioned very often.

One hybrid approach I did consider was, if you know the entire alphabet upfront, keeping a hash map of char->array index (contiguous index into your child array) for the best of both worlds. Simply scan your dictionary up front and tell the Trie which unicode chars you'll be using at construction.

Question 3

If have just 256 entries per node, why would you consider a hashmap at all? If you make the hashmap smaller, you increase the risk of collisions at lower nodes and the nice properties are gone... If you make it dynamic, you get all the management overhead...

Question 4

when you declare your nested hashmap - how deep will you make it If it's not a fixed depth - then you have just reproduced a "node" approach using a hashmap as a node
hashmap->hashmap->hashmap will take more space and be slower than just using a hash of string.
Hashmaps aren't sorted - so now you have an unsorted map and that really isn't a trie