Question

My feature vector has both continuous (or widely ranging) and binary components. If I simply use Euclidean distance, the continuous components will have a much greater impact:

Representing symmetric vs. asymmetric as 0 and 1 and some less important ratio ranging from 0 to 100, changing from symmetric to asymmetric has a tiny distance impact compared to changing the ratio by 25.

I can add more weight to the symmetry (by making it 0 or 100 for example), but is there a better way to do this?

Was it helpful?

Solution

You could try using the normalized Euclidean distance, described, for example, at the end of the first section here.

It simply scales every feature (continuous or discrete) by its standard deviation. This is more robust than, say, scaling by the range (max-min) as suggested by another poster.

OTHER TIPS

If i correctly understand your question, normalizing (aka 'rescaling) each dimension or column in the data set is the conventional technique for dealing with over-weighting dimensions, e.g.,

ev_scaled = (ev_raw - ev_min) / (ev_max - ev_min)

In R, for instance, you can write this function:

ev_scaled = function(x) {
    (x - min(x)) / (max(x) - min(x))
}  

which works like this:

# generate some data: 
# v1, v2 are two expectation variables in the same dataset 
# but have very different 'scale':
> v1 = seq(100, 550, 50)
> v1
  [1] 100 150 200 250 300 350 400 450 500 550
> v2 = sort(sample(seq(.1, 20, .1), 10))
> v2
  [1]  0.2  3.5  5.1  5.6  8.0  8.3  9.9 11.3 15.5 19.4
> mean(v1)
  [1] 325
> mean(v2)
  [1] 8.68

# now normalize v1 & v2 using the function above:
> v1_scaled = ev_scaled(v1)
> v1_scaled
  [1] 0.000 0.111 0.222 0.333 0.444 0.556 0.667 0.778 0.889 1.000
> v2_scaled = ev_scaled(v2)
> v2_scaled
  [1] 0.000 0.172 0.255 0.281 0.406 0.422 0.505 0.578 0.797 1.000
> mean(v1_scaled)
  [1] 0.5
> mean(v2_scaled)
  [1] 0.442
> range(v1_scaled)
  [1] 0 1
> range(v2_scaled)
  [1] 0 1

You can also try Mahalanobis distance instead of Euclidean.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top