Correlations - Get values in the way we want
-
16-10-2019 - |
Question
I have :
a matrix X with N lines
a vector Y
I've computed the Euclidean distance with Y for each line of X.
What I get is a vector of distances.
What I want is a vector of scores between 0 and 1, 1 meaning "very" high correlation, 0 meaning "no" correlation.
Here what I did :
I divided the vector of distances by the max distance inside it. I get vector D.
1 - D is the final result with values between 0 and 1.
The problem is that I get many values (75%) too close to 1. Do you think what I did is correct ?
How would you get a better result ? (Between 0 and 1 but not everything too close to 1)
For now, I tried to take the square of the result. (To stay between 0 and 1 but to minimize the values)
Here a picture of the distance values I want to turn in a score
Solution
Several kernel functions can serve as similarity functions (=scores). See a list, for example, here. You can try several of them and see which suits you the best.
You need something that drops fast at low distances. You can try $$ score = 1/(1+distance)^2$$ and adjust coefficient in front of distance so that the score fits between 0 and 1
About your picture: what are axis labels? and what are x-ticks?
OTHER TIPS
Use sigmoidal functions to get the best correlational value. Use Octave/MATLAB for processing your matrix using the function.