I was reading this question:

How to understand Locality Sensitive Hashing?

But then I found that the equation to calculate the cosine similarity is as follows: Cos(v1, v2) = Cos(theta) = (hamming distance/signature length) * pi = ((h/b) * pi )

Which means if the vectors are fully similar, then the hamming distance will be zero and the cosine value will be 1. But when the vectors are totally not similar, then the hamming distance will be equal to the signature length and so we have cos(pi) which will result in -1. Shouldn't the similarity be always between 0 and 1?

有帮助吗?

解决方案

Cosine similarity is the dot product of the vectors divided by the magnitudes, so it's entirely possible to have a negative value for the angle's cosine. For example, if you have unit vectors pointing in opposite directions, then you want the value to be -1. I think what's confusing you is the nature of the representation because the other post is talking about angles between vectors in 2-D space whereas it's more common to create vectors in a multidimensional space where the number of dimensions is customarily much greater than 2, and the value for each dimension is non-negative (e.g., a word occurs in document or not), resulting in a 0 to 1 range.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top