How to control scoring and ordering in Zend_Search_Lucene, so one field is more important than the other?

StackOverflow https://stackoverflow.com/questions/15754602

Question

From what I understand, after reading documentation (especially scoring part), every field I add has the same level of importance when scoring searched results. I have following code:

protected static $_indexPath = 'tmp/search/indexes/projects';

public static function createSearchIndex()
{
    $_index = new Zend_Search_Lucene(APPLICATION_PATH . self::$_indexPath, true);

    $_projects_stmt = self::getProjectsStatement();
    $_count = 0;

    while ($row = $_projects_stmt->fetch()) {

        $doc = new Zend_Search_Lucene_Document();

        $doc->addField(Zend_Search_Lucene_Field::text('name', $row['name']));
        $doc->addField(Zend_Search_Lucene_Field::text('description', $row['description']));
        $doc->addField(Zend_Search_Lucene_Field::unIndexed('projectId', $row['id']));

        $_index->addDocument($doc);
    }

    $_index->optimize();
    $_index->commit();
}

The code is simple - I'm generating index, based on data fetched from db, and save it in the specified location.

I was looking in many places, as my desired behavior is that name field is more important than description (let's say 75% and 25%). So when I will search for some phrase, and it will be found in description of the first document, and in name of the second document, then second document will in fact have 3 times bigger score, and will show up higher on my list.

Is there any way to control scoring/ordering in this way?

Was it helpful?

Solution

I found it out basing on this documentation page. You need to create new Similarity algorithm class, and overwrite lengthNorm method. I copied this method from Default class, added $multiplier variable, and set it's value when needed (for a column I want):

class Zend_Search_Lucene_Search_Similarity_Projects extends Zend_Search_Lucene_Search_Similarity_Default
{
    /**
     * @param string $fieldName
     * @param integer $numTerms
     * @return float
     */
    public function lengthNorm($fieldName, $numTerms)
    {
        if ($numTerms == 0) {
            return 1E10;
        }

        $multiplier = 1;

        if($fieldName == 'name') {
            $multiplier = 3;
        }

        return 1.0/sqrt($numTerms / $multiplier);
    }
}

Then the only thing you need to do (edit of code from question) is set your new Similarity algorithm class as a default method just before indexing:

protected static $_indexPath = 'tmp/search/indexes/projects';

public static function createSearchIndex()
{
    Zend_Search_Lucene_Search_Similarity::setDefault(new Zend_Search_Lucene_Search_Similarity_Projects());

    $_index = new Zend_Search_Lucene(APPLICATION_PATH . self::$_indexPath, true);

    $_projects_stmt = self::getProjectsStatement();
    $_count = 0;

    while ($row = $_projects_stmt->fetch()) {

        $doc = new Zend_Search_Lucene_Document();

        $doc->addField(Zend_Search_Lucene_Field::text('name', $row['name']));
        $doc->addField(Zend_Search_Lucene_Field::text('description', $row['description']));
        $doc->addField(Zend_Search_Lucene_Field::unIndexed('projectId', $row['id']));

        $_index->addDocument($doc);
    }

    $_index->optimize();
    $_index->commit();
}

I wanted to extra boost name field, but you can do it with anyone.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top