Вопрос

I am trying to add recommendations to our e-commerce website using Mahout. I have decided to use Item Based recommender, i have around 60K products, 200K users and 4M user-product preferences. I am looking for a way to provide recommendation by calculating the item similarities offline, so that the recommender.recommend() method would provide results in under 100 milli seconds.

DataModel dataModel = new FileDataModel("/FilePath");

_itemSimilarity = new TanimotoCoefficientSimilarity(dataModel);

_recommender = new CachingRecommender(new GenericBooleanPrefItemBasedRecommender(dataModel,_itemSimilarity));

I was hoping if someone could point out to a method or a blog to help me understand the procedure and challenges with an offline computation of the item similarities. Also what is the recommended procedure was storing the pre-computed results from item similarities, should they be stored in a separate db, or a memcache?

PS - I plan to refresh the user-product preference data in 10-12 hours.

Это было полезно?

Решение

MAHOUT-1167 introduced into (the soon to be released) Mahout 0.8 trunk a way to calculate similarities in parallel on a single machine. I'm just mentioning it so you keep it in mind.

If you are just going to refresh the user-product preference data every 10-12 hours, you are better off just having a batch process that stores these precomputed recommendations somewhere and then deliver them to the end user from there. I cannot give detail information or advice due to the fact that this will vary greatly according to many factors, such as your current architecture, software stack, network capacity and so on. In other words, in your batch process, just run over all your users and ask for 10 recommendations for every one of them, then store the results somewhere to be delivered to the end user.

Другие советы

If you need response within 100 Milli seconds, it's better to do batch processing in the background on your server and that may include the following jobs.

  1. Fetching data from your own user database (60K products, 200K users and 4M user-product preferences).
  2. Prepare your data model based on the nature of your data (number of parameters, size of data, preference values etc..lot more) This could be an important step.
  3. Run algorithm on the data model (need to choose the right algorithm according to your requirement). Recommendation data is available here.
  4. May need to process the resultant data as per the requirement.
  5. Store this data into a database (It is NoSQL in all my projects)

The above steps should be running periodically as a batch process.

Whenever a user requests for recommendations, your service provides a response by reading the recommendation data from the pre-calculated DB.

You may look at Apache Mahout (for recommendations) for this kind of task.

These are the steps in brief...Hope this helps !

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top