elastica scoring based on regular expression using mvel

https://stackoverflow.com/questions/19870669

29-07-2022
|

Вопрос

I am new to elastic search and here is my scenario I am trying to solve. I have a search input box that supports autosuggestion logic. The results are fetched from an elastic index which uses ngram filter. What I want to improve is to introduce a scoring capability so as to order the results from the most important to the less important one (depending on the score). The score must be based on the following cases:

If there is a match that starts with the given string, set score 100
If there is a match that contains the given string and does not start with it, set score to 10

For this purpose an elastica script was implemented with mvel statements in order to support regular expression match. In other words, it checks to see if the value on the left matches the regular expression on the right (only then a variable is incremented accordingly). But unfortunately it goes wrong when search string is language specific despite the fact that the value on the left is of the specified language too. Another problem to deal with is the second case I mention above (cannot make it to work).

The script when a value ('one example' (belongs to the name field)) starting with the given word ('one') works just fine.

$testParam = mb_strtolower('one', 'utf-8');
$regexStart = '^' . $testParam . '.*$';
$ElasticaScript = new Elastica_Script(" total = 1; if(doc['name'].value ~= '{$regexStart}'){ total += 100; } return total; ");

The script when a value ('one example' (belongs to the name field)) contain the given word ('example') does not work and as a result total score remains 1 and does not increment to 11 as it should be.

$testParam = mb_strtolower('example', 'utf-8');
$regexStart = '^.*' . $testParam . '.*$';
$ElasticaScript = new Elastica_Script(" total = 1; if(doc['name'].value ~= '{$regexStart}'){ total += 10; } return total; ");

And at last, with the same logic, when I try to match a greek word against a value (containing greek letters) of the name field, the increment of the total score is ignored as well.

All the work has been done using the elastica, let alone php. Could you please help to solve my problem ? If there is another approach/solution, feel free to share it with me.

Thank you in advance

Решение

doc['name'].value loads the analyzed version of the field. Unless your field is set to not analyzed, this will likely be very different than the original content of the field, and not useful for doing regex matches. The Elasticsearch docs on script fields say this only makes sense for non-analyzed or single term fields. For example, if your content is indexed as ngrams, this value will consist of ngrams.

You can access the original text of the field using _source.field_name, and then compute your score based on that. You can still do your search as usual against the ngrams, and use the _source just for scoring.

Here's a sample function_score query that defaults the score to _score, adds 100 if the name field starts with one, else adds 10 if the name field contains one anywhere else. It uses _source.name to access the contents of the name field, so it's doing the regex against the original text of the name field, not the ngrams calculated from the name field.

{
  "query": {
    "function_score": {
      "boost_mode": "replace",
      "script_score": {
        "script": "total = _score; if (_source.name ~= '^one.*') { total += 100 } else if (_source.name ~= '.*?one.*?') { total += 10 } return total"
      }
    }
  }
}

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow