Why Does LogLikelihoodSimilarity function return values greater than 1.0 for a dataset of 0s and 1s?

https://stackoverflow.com/questions/10179018

01-06-2021
|

Question

I have a large dataset of preferences that are expressed as 1.0, and I am using the Tanimoto Similarity functions and the Generic Boolean User and Item Preference Recommenders. Recommendations are generally values between 0 and 1.0.

Many sources, such as the Mahout in Action book, and this prior SO thread recommend the LogLikelihoodSimilarity metric over Tanimoto for boolean datasets. When I switched to the LogLikelihood Similarity metric, it generated some scores in a much higher range, such as 11. I had to go back to Tanimoto to get more sensical ratings. Can you suggest any potential fixes, or am I misunderstanding the return values of the recommended item scores?

Solution

In the case of no ratings, the value you observe is not a predicted rating. After all, they are all 1.0 and so can't be used for ranking. The result is actually a sum of similarities, which is why it can be arbitrarily large. It is not supposed to be in [0,1] or anything like that.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow