Your query should work as you specify. Since you specify inject=true
on your PhoneticFilter
, you should indeed get more term matches on an exact match (that is, both a metaphone match, and a plain text match), and this bears out as far as my testing is concerned.
The problem I do see, is that your analysis leaves you with case-sensitive searching for exact matches. If you index "John", and search for "john", the phonetic matching will work out just fine, but you'll miss the exact match due to the case-sensitivity.
Simply adding a LowercaseFilter
to your filter chain should fix that. I would recommend adding it directly above your PhoneticFilter
, like:
filters = {
@TokenFilterDef(factory = StandardFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = PhoneticFilterFactory.class, params = {
@Parameter(name = "encoder", value = "DoubleMetaphone"),
@Parameter(name = "inject", value = "true")
})
}
The positioning above the PhoneticFilterFactory
maintains the metaphones in uppercase, which not only follows convention, but also ensures that the metaphone codes and plain-text will not match each other. Can't think of any cases where that would be a concern, actually, but seems nice anyway.