Question

I am kinda new to datastructures and I am implementing a trie to disambiguate a database of names using edit distance. I am using the following implementation of the trie:

http://stevehanov.ca/blog/index.php?id=114

which is basically:

class TrieNode:

    def __init__(self):
       self.word = None
       self.children = {}

       global NodeCount
       NodeCount += 1

    def insert( self, word ):
       node = self
       for letter in word:
            if letter not in node.children: 
                node.children[letter] = TrieNode()

            node = node.children[letter]

       node.word = word

# read dictionary file into a trie
trie = TrieNode()
for name in names:
    WordCount += 1
    trie.insert( name )

This does the job beautifully as it inserts all the names into a trie. Now, I go through the list of names I have one by one, and use the trie to return a list of all names that are at a certain edit distance from the passed name. I want to then delete all the names from the trie that were returned in the list.

Is there a fast way to do that?

Thanks!

Was it helpful?

Solution

There are two ways to do this, depending on whether you want to check whether you're removing the last path through any internal node (which makes removes slightly slower, but potentially makes searches after the removes slightly faster). Both ways are trivial to do recursively, but if you want to unroll it iteratively (as your insert does), not checking is easier, so I'll do that.

def delete(self, word):
    node = self
    for letter in word[:-1]:
        if letter not in node.children:
            return False
        node = node.children[letter]
    if word[-1] in node.children:
        del node.children[letter]
        return True
    return False

Can you make this faster? Yes, but it may not matter.

First, you know that the nodes will always exist, so you can remove some of the error checking. More importantly, if you can make the search function return the nodes, instead of just their values, that will make things a little faster. If you can add backlinks up the trie, that means you can erase the node in constant time instead of repeating the search. If you don't want backlinks up the trie, you can get the exact same benefit by returning a zipper instead of a node—or, more simply, just returning a stack of nodes.

But really, the worst case here is just doubling the work, not increasing the algorithmic complexity or multiplying by a large factor, so simple probably wins.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top