I'm guessing you are using StandardAnalyzer
when indexing your terms, and then are searching without analysis in some form, or with a different form of analysis.
The 2.9 StandardAnalyzer
(ClassicAnalyzer
, as of version 3.1) has some interesting behavior around hyphens. To quote the StandardTokenizer
documentation:
Splits words at hyphens, unless there's a number in the token, in which case the whole token is interpreted as a product number and is not split.
So, two hyphenated words (or any collection of letters) will be split into separate tokens, when any number thrown into the mix will interpret the whole thing as a product number, and index as a ingle token, hyphens and all, so:
- "high-qualtiy" --> "high" and "quality"
- "ab-cd" ---------> "ab" and "cd"
- "30-40" ---------> "30-40"
- "ab-c4" ---------> "ab-c4"
- "30 40" ---------> "30" and "40"
So, if you construct a TermQuery
for "high-quality" on such an analyzed field, you will get no results (though you would if using the QueryParser
with the same analyzer). When searching for "30-40", the TermQuery
for "30-40" will be an exact match. but matches will be found for neither "30" nor "40".
So, I'm not how you are querying to run into the mismatch there (perhaps using StandardAnalyzer
when indexing, and WhitespaceAnalyzer
when querying?), but hopefully that points in the right direction.