Question

I have a question about the meaning of the preference values in Mahout. From the book Mahout in Action it says that:

The preference value could be anything, as long as larger values mean stronger positive preferences. For instance, these values might be ratings on a scale of 1 to 5, where 1 indicates items the user can’t stand, and 5 indicates favorites.

Does that mean that the recommender will always interpret the smaller values as negative preferences (not liking)?

I am trying to create a recommender where there are no negative preferences. I mean I don't have preferences at all, but I can derive them based on different weighted metrics (number of clicks/edits, amount of edit, way of edit, etc...). However, in my implementation, when a user have edited some page, that doesn't mean that the user dislikes the page, but likes it in some amount(and the strogness of liking is derived as I describe above).

I have tried using only Boolean preferences (Log-likelehood and Tanimoto similarity) but they don't perform good, and in most of the cases the they are not able to produce recommendation (in moore then 50%).

I want to take advantage of numbers to have in order to derive preferences so the recommendation will be better, but I am not sure how. I've tried having preference values derived as above from 5 to 10, and then every user have preference value of 1 for an artificial item (meaning not liking it). However I believe that this is not good approach, since this will mean that every user dislikes the same item.

Does someone has better idea how can I apply some of the user-based and item-based algorithms only having "positive" (liking) preference values?

Était-ce utile?

La solution 2

You should try:

  • the implicit preference ParallelALSFactorizationJob (hadoop based)
  • Or the implicit preference ALSWRFactorizer alongside an SVDRecommender (not hadoop based) (I think the this non-hadoop implicit preference variant is only available in mahout-0.8),

In these the number you assign to a user preference for an item is an indication of how strong that association is, and not a rating, so they are all positive associations, just with different strengths. This way you can model your different interactions, such as view, edit, click, etc. Although the strength assigned to each will vary according to your particular business.

This presentation (link) should give you a rough idea of what is happening. Also this paper (link) describes the implicit feedback variant of the factorizers (they are the same, one is just meant to scale with hadoop)

Autres conseils

If you mean, can you get reasonable results based only on positive actions, then yes of course. This is the common case. How the preference values are interpreted depends a lot on the algorithm you employ, but I don't see any problem with encoding all positive actions with all positive values, for any algorithms. This is the easy case. "1" is not inherently a negative rating, no.

Your point about similarity metrics is not related to values though. Sounds like your data is very sparse. This is a separate problem.

The rest I am not sure I understand. The values you use depend on your domain. I would make them proportional to their "strength" or value. For example if video views are 20x more frequent than video shares, you might make a video share's value as an action 20x higher than a click. It's a decent place to start.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top