Apache Mahout + Pearson Correlation Ignores Users With Same Preference For Every Item

StackOverflow https://stackoverflow.com/questions/7773170

  •  09-02-2021
  •  | 
  •  

Question

I'm using Mahout with the Pearson Correlation algorithm to compare and find similar users based on their preferences for several items. The problem I'm running into is that Mahout and/or Pearson is ignoring users that select the same preference for every item. Does anyone know if there is a way to configure Mahout to NOT ignore people that select the same preference value for every item.

Was it helpful?

Solution

It is not a question of configuration. The Pearson correlation is undefined in this case, so there can be no similarity computed between them using this metric.

Essentially -- Pearson is the ratio of the two preference series' covariance to the product of their standard deviations. But when one or both sequences are identical, the standard deviation is 0, as is the covariance, so the correlation is 0/0.

(This and a few other Pearson gotchas are covered in Chapter 4 of Mahout in Action, and I'm author of this part of the book and code.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top