Question

I have :

  • a matrix X with N lines

  • a vector Y

I've computed the Euclidean distance with Y for each line of X.

What I get is a vector of distances.

What I want is a vector of scores between 0 and 1, 1 meaning "very" high correlation, 0 meaning "no" correlation.

Here what I did :

I divided the vector of distances by the max distance inside it. I get vector D.

1 - D is the final result with values between 0 and 1.

The problem is that I get many values (75%) too close to 1. Do you think what I did is correct ?

How would you get a better result ? (Between 0 and 1 but not everything too close to 1)

For now, I tried to take the square of the result. (To stay between 0 and 1 but to minimize the values)

Here a picture of the distance values I want to turn in a score distance values I want to turn in a score

Was it helpful?

Solution

Several kernel functions can serve as similarity functions (=scores). See a list, for example, here. You can try several of them and see which suits you the best.

You need something that drops fast at low distances. You can try $$ score = 1/(1+distance)^2$$ and adjust coefficient in front of distance so that the score fits between 0 and 1 enter image description here

About your picture: what are axis labels? and what are x-ticks?

OTHER TIPS

Use sigmoidal functions to get the best correlational value. Use Octave/MATLAB for processing your matrix using the function.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top