Question

My data looks like this:

data =

    2.29    2.048333333 2   2
    2.29    2.048333333 2   2
    2.29    2           2   2
    2.29    2.064444444 2   2

I want to calculate the euclidean distance between columns. The result is a 4X4 matrix and all diagonal elements are 0 because they are the same.

How can I do this efficiently?

Until now, I only can find out euclidean distance between 2 columns

Should I use them multiple times using loop?

Was it helpful?

Solution

Try this:

def main(data):
    total = []
    n = len(data)
    for i in range(n):
        tmp = []
        for j in range(n):
            a = data[i];
            b = data[j]
            tmp.append(dist(data[i],data[j]))
        total.append(tmp)
    return total

def dist(a,b):
    tmp = [pow(a - b,2) for a, b in zip(a, b)]
    return pow(sum(tmp),0.5);

def output(t):#this function is not necessary and is just for tidiness
    n = len(t)
    for i in range(n):
        for j in range(n):
            print t[i][j],"\t\t\t",
        print "\n"

data = [[1,1,1],[1,2,3],[0,0,0]]#just for test
t = main(data)
output(t)

OTHER TIPS

If data is numpy array, this code may be more efficient.

dist = np.empty_like(data)
for i, x in enumerate(data):
    dist[:, i] = np.sqrt(np.sum((data - x)**2, axis=1)) 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top