Вопрос

I'm trying to achieve two goals with the single Sphinx request: get results that match any word from query and have exact match on the first place. For example if I had song search request database:

  1. Miley Cyrus Ball
  2. Miley Cyrus Wrecking
  3. Miley Cyrus

And two test queries:

  1. Miley Cyrus
  2. Miley Cyrus Wrecking Ball

If I search for "Miley Cyrus" I want to get row #3 and if I search for "Miley Cyrus Wrecking Ball" I want to get #1 or #2. I tried different combination of matching and ranking modes but still can't get this working with the single request. When I try SPH_MATCH_EXTENDED2 and SPH_RANK_SPH04 my first test query works fine returning result #3 on the first place, but the second test query returns no results. When I try SPH_MATCH_ANY I get partial matched results for the second test query (#2 has a bit higher weight which seems correct) but the first query returns 3 rows with the same weight and #1 is on top because of the order in the DB. The only workaround I have for now is making two queries: one for exact match and another one for partial match if the first one failed.

Also from this article I see that all match modes except SPH_MATCH_EXTENDED2 are legacy, so what should I use for partial match like in example above when they are removed?

Это было полезно?

Решение

td;dr There is only one Matching mode - Extended. Don't use any others. If you want to modify what documents are included, modify the query itself (eg with quorum operator). Then can pick how documents are ordered using Ranking mode.


The first thing to realise, is that matching and ranking are two distinct topics.

  • Matching is what documents are even present the results, ie comparing the query and saying yes/no to the question "does this document match the query?"

  • Ranking is computing a weight, so the best matches can rise to the top by sorting by weight.

historically matching and ranking where combined into one concept, you choose teh matching mode (which chose how query was inpreted) and a suitable ranking calculation was automatically selected.

This realised to be not flexible enough, so where seperated. But lots of people used the old behaviour, so the old matching modes (any/phrase etc) where maintained for compatibility reasons.

Internally there is only ONE matching mode - Extended. The older legacy matching modes, automatically rewrite the query as needed (change it to extended query syntax), and pick a particular ranking mode.

So by keeping extended matching mode, you get to choose yourself the ranking mode. So can choose matching (modifying the query) or the ranking behaviour independently.


I explained all the backstory to show you that if the provided matching modes aren't good enough, you can do the same thing. ie

  • You need to choose a particular ranking mode (or even a completely custom one via the ranking expression)

  • AND you may well need to modify the query itself, to change the matching behaviour. (remember choosing MATCH_ANY, changes the query AND selects a ranking mode.)

So could rewrite the query to use quorum, eg

"Miley Cyrus Wrecking Ball"/2

Remembering to keep Extended match mode. Then can choose a ranking mode independatly (setRankingMode) - eg can now use SPH_RANK_SPH04, but you do get 'fuzzy' matching behaviour (like would with match any)

... dont forget to try other ranking modes too.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top