Question

I'm building a search function for a php website using Zend Lucene and i'm having a problem. My web site is a Shop Director (something like that).

For example i have a shop named "FooBar" but my visitors seach for "Foo Bar" and get zero results. Also if a shop is named "Foo Bar" and visitor seaches "FooBar" nothing is found.

I tried to seach for " foobar~ " (fuzzy seach) but did not found articles named "Foo Bar"

Is there a speciar way to build the index or to make the query?

Was it helpful?

Solution

Option 1: Break the input query string in two parts at various points and search them. eg. In this case query would be (+fo +bar) OR (+foo +bar) OR (+foob +ar) The problem is this tokenization assumes there are two tokens in input query string. Also, you may get extra, possibly irrelevant, results such as results of (+foob +ar)

Option 2: Use n-gram tokenization while indexing and querying. While indexing the tokens for "foo bar" would be fo, oo, ba, ar. While searching with foobar, tokens would be fo, oo, ob, ba, ar. Searching with OR as operator will give you the documents with maximum n-gram matches at the top. This can achieved with NGramTokenizer

OTHER TIPS

Manually add index entries for most common name confusions. Get your customers to type them in on a special form.

Did you tried "*foo* AND *bar*" or "*foo* OR *bar*"? It works in Ferret and I read it is based on Lucene.

If you don't care about performance, use WildcardQuery (performance is significantly worse):

new WildcardQuery( new Term( "propertyName", "Foo?Bar" ) );

For zero or more characters, use '*', for zero or one character, use '?'

If performance is important, try using BooleanQuery.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top