Pergunta

Hello: I want to do a "weighted search" on product that are tagged with keywords. (So: not fulltext search, but n-to-m-relation). So here it is:

Table 'product':
sku  - the primary key
name

Table 'keywords':
kid   - keyword idea
keyword_de  - German language String   (e.g. 'Hund','Katze','Maus')
keyword_en  - English language String  (e.g. 'Dog','Cat','Mouse')

Table 'product_keyword' (the cross-table)
sku   \__ combined primary key
kid   /

What I want is to get a score for all products that at least "contain" one relevant keyword. If I search for ('Dog','Elephant','Maus') I want that

Dog credits a score of 1.003, Elephant of 1.002 Maus of 1.001

So least important search term starts at 1.001, everything else 0.001++. That way, a lower score limit of 3.0 would equal "AND" query (all three keywords must be found), a lower score limit of 1.0 would equal an "OR". Anything in between something more or less matching. In particular by sorting according to this score, most relevant search results would be first (regardless of lower limit)...

I guess I will have to do something with

  IF( keyword1 == 'dog', 1.001, 0) + IF...

maybe inside a SUM() and probably with a GROUP BY at the end of a JOIN over the cross table, eh? But I am fairly clueless how to tackle this.

What would be feasible, is to get the keyword id's from the keywords beforehand. That's a cheap query. So the keywords table can be left ignored and it's all about the other of the cross and product table...

I have PHP at hand to automatically prepare a fairly lengthy PHP statement, but I would like to avoid further multiple SQL statements. In particular since I will limit the query outcome (most often to "LIMIT 0, 20") for paging mode results, so looping a very large number of in between results through a script would be no good...

DANKESCHÖN, if you can help me on this :-)

Foi útil?

Solução

I think a lot of this is in the Lucene engine (http://lucene.apache.org/java/docs/index.html), which is available for PHP in the Zend Framework: http://framework.zend.com/manual/en/zend.search.lucene.html.

EDIT:

If you want to do the weighted thing you are talking about, I guess you could use something like this:

select p.sku, sum(case k.keyword_en when 'Dog' then 1001 when 'Cat' then 1002 when 'Mouse' then 1003 else 0 end) as totalscore
from products p
left join product_keyword pk on p.sku = pk.sku
inner join keywords k on k.kid = pk.kid
where k.keyword_en in ('Dog', 'Cat', 'Mouse')
group by p.sku

(Edit 2: forgot the group by clause.)

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top