I suspect that field is not analyzed when indexed.
So, with the first query, you are getting hits from the wildcard query. *berry*
matches all of the examples you've given. *bery*
doesn't match any of the documents, though, since it's not actually a substring of any of them.
For the fuzzy query, terms are compared by edit distance (Damerau–Levenshtein distance). An edit distance of two is the default maximum for a match.
bery
toberry
- edit distance: 1bery
towild berry
- edit distance: 6bery
tostrawberry
- edit distance: 6bery
toblueberry
- edit distance: 5
This could be handled in part by using an analyzer, instead of indexing the entire string as a single token. Standard analyzer would split wild berry
up into the tokens wild
and berry
, and you could expect a fuzzy match on that.
As far as strawberry and blueberry, unless your analyzer splits apart straw
and berry
somehow, you could manually specify terms to split apart by incorporating a SynonymFilter
into your analyzer.
Another option would be to attempt to correct the query spelling before searching, using lucene's SpellChecker