Question

I've been using rpy2 to calculate the mahalanobis distance between a test vector and a prior distribution. I'd like to drop rpy2 and move to scipy, but when I test it, rpy2 and scipy don't return the same result. Here's my sample code.

import numpy as np
from scipy import linalg
from scipy.spatial.distance import mahalanobis as mahalanobis
import rpy2.robjects as robjects

# The vector to test.
test_values = [692.5816522801106, 1421.4737901031651, 6.117859, 7.259449]
test_values_r = robjects.FloatVector(test_values)
test_values_np = np.array(test_values)

# The covariance matrix from the prior distribution
covs = [15762.87, 13486.23, 34.61164, 22.15451, 
        13486.23, 36003.67, 33.8431, 30.52712, 
        34.61164, 33.8431, 0.4143354, 0.1125765, 
        22.15451, 30.52712, 0.1125765, 0.2592451]
covs_np = np.reshape(np.array(covs), (4,-1))
covs_r  = robjects.r["matrix"](robjects.FloatVector(covs), nrow = 4)

# The means of the prior distribution
centers = [808.0645, 1449.711, 4.8443, 4.95776]
centers_np = np.array(centers)
centers_r  = robjects.FloatVector(centers)

r_dist = robjects.r["mahalanobis"](test_values_r, centers_r, covs_r)
# <FloatVector - Python:0x1052275a8 / R:0x10701bfa8>
# [29.782287]

np_dist = mahalanobis(test_values_np, centers_np, linalg.inv(covs_np))
# 5.4573150053873185

Am I missing something obvious?

Was it helpful?

Solution

The R function returns the squared Mahalanobis distance (see here for example).

Thus:

>>> r_dist[0]
29.782287068025585
>>> np_dist
5.4573150053873185
>>> np_dist**2 - r_dist[0]
3.5527136788005009e-15
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top