autocomplete algorithms, papers, strategies, etc
-
03-07-2019 - |
Question
I'm wondering if anyone has good resources to read or code to experiment for "autcomplete"
I would like to know what's the theory behind autocompletion, where to start what are the commonn mistakes etc.
I found fascinating the way products like Enso, Launchy, Google chrome and even tcsh perform their auto complete, I started my self just for curiosity some sample code and I got to the conclusion this must be a field widely explored before.
I would appreciate if someone shares any good technical resource on how to implement this.
Thanks in advance.
Solution
- http://humanized.com/weblog/2007/03/30/what_makes_a_good_autocomplete/ --
- http://social.msdn.microsoft.com/Forums/en-US/vblanguage/thread/2ccb37b9-c7e1-4113-86ac-ad3d33b4b4b1/ -- in the .Net world
- A nasty patent on autocompletion approach (still possibly worth reading for the theory) http://www.patentstorm.us/patents/5845300/description.html
- http://ask.metafilter.com/91068/Fuzzy-text-completion-algorithm for a high level discussion on strategies to take.
OTHER TIPS
Check out this blog on implementing autocomplete using GWT:
http://jroller.com/glongman/entry/gwt_autocompleter
But I would recommend you first start with something very simple on your own to grasp how the implementation is done. I'd start with a Trie, maybe even stored completely on the client, then progress to optimizing with server queries if you think they're necessary.
Autocomplete is usually implemented using one of the following:
- Trees. By indexing the searchable text in a tree structure (prefix tree, suffix tree, dawg, etc..) one can execute very fast searches at the expense of memory storage. The tree traversal can be adapted for approximate matching.
- Pattern Partitioning. By partitioning the text into tokens (ngrams) one can execute searches for pattern occurrences using a simple hashing scheme.
- Filtering. Find a set of potential matches and then apply a sequential algorithm to check each candidate.
A couple of papers about the subject:
- Bořivoj Melichar. Approximate String Matching by Finite Automata;
- Gonzalo Navarro. A Guided Tour to Approximate String Matching;
- Leonid Boytsov. Indexing Methods for Approximate Dictionary Searching: Comparative Analysis;
- Marios Hadjieleftheriou and Divesh Srivastava. Approximate String Processing;
- Surajit Chaudhuri and Raghav Kaushik. Extending Autocompletion To Tolerate Errors;
Take a look at completely, a Java autocomplete library.