Question

I am using zend_search_lucene to search for keyword in documents. In one of the documents it has phrase This taught me a valuable lesson in time management as I still had to attend lectures and tutorials during the day. I enjoyed improving my telephone manner and learning to deal with different reactions to my requests for donations.

Now, if searched for 'valuable lesson on time management' it results nothing. I am using below code to search it.

Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive());

$index = new Zend_Search_Lucene('/home/project/mgh/data/search_file/lucene.customer.index');

 Zend_Search_Lucene::getDefaultSearchField('contents');

    $results = $index->find('contents:"valuable lesson on cost management" ');

    $this->count=count($results);

here in above example, only mismatch is, in place of 'in' there is 'on' but remaining words are getting matched. How to get result count if few words are matched ( even if few words get unmatched) ?

Thanks for suggestions.

Reference: http://framework.zend.com/manual/en/zend.search.lucene.query-language.html

Was it helpful?

Solution

The key here may be stopwords. If you had 'in' and 'on' defined as stopwords (words Lucene will ignore because they are too common), then your query 'valuable lesson on time management' would match the 'valuable lesson in time management' section of your document text.

OTHER TIPS

The problem is not on zend_search_lucene, but on how Lucene is indexing your data. I recommend you read Analyzers, Tokenizers, and Token Filters on the Solr documentation to understand how it works. Also it would be of help if you post your schema.xml info (where you define which info should be indexed and in which way).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top