Question

I am trying out mahout and wondering about the input datamodel

for non-distributed version

file datamodel has to follow: userid, itemid, userPreference the problem is i dont have this user preference values, have to precompute it does mahout have any method to do it?

I found an article http://www.codeproject.com/Articles/620717/Building-A-Recommendation-Engine-Machine-Learning the author seems did not really have user perference values, but he used org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -s SIMILARITY_COOCCURRENCE to compute from {userid, questionid} from what I can tell, mahout seems compute perference values from data then compute recommendation, am I correct in this case?

Was it helpful?

Solution

If you don't have user preference values, maybe you don't need them. Mahout offers an implementation for recommending items for users without having preference values. This is called Boolean preferences. Basically you just know that some user likes some item, but you don't know how much. Sometimes this is fine.

Bellow is a sample code how this can be done. Basically only the first line differs, where you tell that your data model is of type BooleanPrefDataModel. Then with boolean data you can use two types of similarity measures: LogLikelihoodSimilarity, TanimotoCoefficientSimilarity. Both can be used for compute user-based and item-based recommendations.

DataModel model = new GenericBooleanPrefDataModel( GenericBooleanPrefDataModel.toDataMap( new FileDataModel(new File("FILE_NAME"))));

UserSimilarity similarity = new LogLikelihoodSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, similarity, model);
Reecommender recommender =  new GenericUserBasedRecommender(model, neighborhood, similarity);

List<RecommendedItem> recommendations = recommender.recommend(1, 10);

for (RecommendedItem recommendation : recommendations) {
    System.out.println(recommendation);
}

The other alternative is to compute the preference values outside mahout and feed the data model in some other user or item-based algorithms. But as far as I know, mahout does not offer implementation for computing preference values.

OTHER TIPS

You can define preference value for your data model (but, it depends on your data model). For example, your data model items are tracks which are listened by users. The preferences value can be defined that user1 listens trackA x times. Thus, preferences value for data model should be defined for every userid-itemid unique pair.

The example of data model :

userid,itemid,preferences

1,1,3 - 1,2,5 - .... - 5,1,2... so on.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top