Question

I'd like to change the similarity before searching index. What I do is:

QueryParser parser = new QueryParser(Version.LUCENE_43, "field", standarAnalyzer);
System.out.println("similarity before: " + parser.getFuzzyMinSim());
parser.setFuzzyMinSim(0.6f);
System.out.println("similarity after: " + parser.getFuzzyMinSim());
Query query = parser.parse(inputString); // inputString is given by the user
System.out.println("Querystring: " + query.toString());

and now, when inputString = "something~" then I get this output

similarity before: 2.0
similarity after: 0.5
Querystring: field:something~2 // Why 2!?

My questions:

  1. Why the similarity is set to 2.0 at the beginning (I thought it is 0.5 by default)?
  2. Why after calling setFuzzyMinSim method it is still 2.0?
Was it helpful?

Solution

FuzzyQuery has been significantly changed in Lucene version 4. The number there after the '~' is a maximum edit distance, not a minimum similarity. I'm not really clear on how FuzzyMinSim is mapped to a maximum edit distance, as when the StandardQueryParser generates a FuzzyQuery. Note that using DefaultFuzzyMinSim in 4.x is deprecated.

An edit distance of 2 is the default maximum, and edit distances greater than 2 are not supported by the FuzzyQuery class, and thus are not supported by the standard query parser.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top