Why does Apache Mahout ItemSimilarity use LP-Space normalization

https://stackoverflow.com/questions/22987142

01-07-2023
|

Вопрос

Why is LP-Space normalization being used for Mahout VectorNormMapper for Item similarity. Have also read that the norm power of 2 works great for CosineSimilarity.

Is there an intuitive explanation of why its being used and how can best values for power be determined for given Similarity class.

Решение

Vector norms can be defined for any L_p metric. Different norms have different properties according to which problem you are working on. Common values of p include 1 and 2 with 0 used occasionally.

Certain similarity functions in Mahout are closely related to a particular norm. Your example of the cosine similarity is a good one. The cosine similarity is computed by scaling both vector inputs to have L_2 length = 1 and then taking the dot product. This value is equal to the cosine of the angle between the vectors if the vectors are expressed in Cartesian space. This value is also sqrt(1-d^2) where d is the L_2 norm of the difference between the normalized vectors.

This means that there is an intimate connection between cosine similarity and L_2 distance.

Does that answer your question?

These questions are likely to get answered more quickly on the Apache Mahout mailing lists, btw.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow