Вопрос

I would like to use Lucene to index/search text. The text can contain mistyped words, names, etc. What is the most simple way of getting Lucene to find a document containing

"this is Licene" 

when user searches for

"Lucene"? 

This is only for a demo app, so we need the most simple solution.

Это было полезно?

Решение

Lucene's fuzzy queries and based on Levenshtein edit distance.

Use a fuzzy query in the QueryParser, with syntax like:

Lucene~0.5

Or create a FuzzyQuery, passing in the maximum number of edits, something like:

Query query = new FuzzyQuery(new Term("field", "lucene"), 1);

Note: FuzzyQuery, in Lucene 4.x, does not support greater edit distances than 2.

Другие советы

Another option you could try is using the Lucene SpellChecker:

http://lucene.apache.org/core/6_4_0/suggest/org/apache/lucene/search/spell/SpellChecker.html

It is a out of box, and very easy to use:

  SpellChecker spellchecker = new SpellChecker(spellIndexDirectory);
  // To index a field of a user index:
  spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field));
  // To index a file containing words:
  spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt")));
  String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);

By default, it is using the LevensteinDistance, but you could provide your own customized Edit Distance.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top