Question

I have a strange issue with lucene.net 2.9: If I searching for: high-quality it doesn't find any results. I found hyphenation char (-) is a problem for Lucene, so I search for high quality and it worked perfectly.

When I search for 30-40 it is showing results but for 30 40 is not showing any.

The second scenarios is in contradiction with first one. I guess the second one is related as I have numerical text, but I didn't find something on web related.

Was it helpful?

Solution

I'm guessing you are using StandardAnalyzer when indexing your terms, and then are searching without analysis in some form, or with a different form of analysis.

The 2.9 StandardAnalyzer (ClassicAnalyzer, as of version 3.1) has some interesting behavior around hyphens. To quote the StandardTokenizer documentation:

Splits words at hyphens, unless there's a number in the token, in which case the whole token is interpreted as a product number and is not split.

So, two hyphenated words (or any collection of letters) will be split into separate tokens, when any number thrown into the mix will interpret the whole thing as a product number, and index as a ingle token, hyphens and all, so:

  • "high-qualtiy" --> "high" and "quality"
  • "ab-cd" ---------> "ab" and "cd"
  • "30-40" ---------> "30-40"
  • "ab-c4" ---------> "ab-c4"
  • "30 40" ---------> "30" and "40"

So, if you construct a TermQuery for "high-quality" on such an analyzed field, you will get no results (though you would if using the QueryParser with the same analyzer). When searching for "30-40", the TermQuery for "30-40" will be an exact match. but matches will be found for neither "30" nor "40".

So, I'm not how you are querying to run into the mismatch there (perhaps using StandardAnalyzer when indexing, and WhitespaceAnalyzer when querying?), but hopefully that points in the right direction.

OTHER TIPS

You need to encrypt "-" sign to URL parameter. I think it will works fine.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top