سؤال

I have a problem with making search output more practically usefull for the end users. The problem is rather related to the algorithm and approach then to exact technology or framework to use.

At the moment we have a database of products, that can be described with following schema:

http://goo.gl/391qj

From the search perspective we've done pretty standard things, 3-rd party text search with token analyzer, handling mistypes and synonyms (it is not the full list, but as I said, it is rather out of scope). But stil we need to perform extra work to make the search result closer to real life user needs, probably, in somewhat similar way how Google ranks indexed pages by relevancy. Ideas, that we`ve already considered as potentially applicable in solving the problem:

  • Analyze most popular search requests in widespread search engines (it is still a question how to get them) and increase rank for those entries in the index, which correspond (could be found with) to the popular requests;
  • Increase rank for newest (hot) entries;
  • Increase rank for the biggest group of entries, which correspond to the popular request and have something in common (that`s why it is a group);

Appreciate for any help or advising a direction, where to dig.

هل كانت مفيدة؟

المحلول

You may try pLSA; there are many references on the web, and there should be libraries and source code.

EDIT:

well, I took a closer look at Lucene recently, and it seems to give a much better answer to what the question actually asked (it does not use pLSA). As for the integration with db, you may use Hibernate Search (although it does not seem to be as powerful as using Lucene directy is).

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top