Question

I want to use the power of phonetic searching in hibernate search. The problem is that exact matches are not ranked to the top of the search result. E.g. a search for "john" returns these resultlist:

  • jon
  • john
  • jone

I would have expected 'john' to be listed on top

I defined my Analyzer in the following way:

    @AnalyzerDef(name = "phonetic", 
    tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), 
    filters = { 
            @TokenFilterDef(factory = StandardFilterFactory.class), 
            @TokenFilterDef(factory = PhoneticFilterFactory.class, params = {
                @Parameter(name = "encoder", value = "DoubleMetaphone"), 
                @Parameter(name = "inject", value = "true") 
            }) 
    })
@Analyzer(definition = "phonetic")
public class User{
    @Field(index=Index.TOKENIZED, store=Store.YES)
    private String firstname;

    @Field(index=Index.TOKENIZED, store=Store.YES)
    private String lastname;
}

The search is done with this code:

String[] fields = new String[] { "firstname", "lastname" };
            MultiFieldQueryParser parser = new MultiFieldQueryParser(fields,
                    sf.getAnalyzer("phonetic"));

It would be great if you could give me any hint/help, how this ranking yould be achieved. I tried to find something via google, bit i only found out that this has to be implemented by myself using query expansion to boost exact matching more than phonetic search results... Thanking you very much in advance for helping me. I am using Hibernate Search 3.1 together with Solr 1.3

Br, Shane

Was it helpful?

Solution

Your query should work as you specify. Since you specify inject=true on your PhoneticFilter, you should indeed get more term matches on an exact match (that is, both a metaphone match, and a plain text match), and this bears out as far as my testing is concerned.

The problem I do see, is that your analysis leaves you with case-sensitive searching for exact matches. If you index "John", and search for "john", the phonetic matching will work out just fine, but you'll miss the exact match due to the case-sensitivity.

Simply adding a LowercaseFilter to your filter chain should fix that. I would recommend adding it directly above your PhoneticFilter, like:

filters = { 
        @TokenFilterDef(factory = StandardFilterFactory.class), 
        @TokenFilterDef(factory = LowerCaseFilterFactory.class),
        @TokenFilterDef(factory = PhoneticFilterFactory.class, params = {
            @Parameter(name = "encoder", value = "DoubleMetaphone"), 
            @Parameter(name = "inject", value = "true") 
        }) 
}

The positioning above the PhoneticFilterFactory maintains the metaphones in uppercase, which not only follows convention, but also ensures that the metaphone codes and plain-text will not match each other. Can't think of any cases where that would be a concern, actually, but seems nice anyway.

OTHER TIPS

Both jon and john are exactly the same from the point of view of a Phonetic based Analyzer. Hibernate Search allows to define multiple Analyzers and you can also index the same property multiple times using the plural form annotation @Fields.

Let's say you index the firstname in two fields named firstname_phonetic and firstname_standard, you can then create two Query instances targeting each, and combine the two Queries using a BooleanQuery with the SHOULD clause. This will get the scorer to combine the scores from both, so that exact matches get ranked higher.

Thanks for the answers, I now used the annotation order of "femtoRgon" and defined multiple analyzers by using @Fields (default and phonetic) when I combine a query with standard and one with phonetic field search using different boot values (more 2.0f boot on standard)

Thanking you all for help

Br, Shane

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top