質問

I've been using rpy2 to calculate the mahalanobis distance between a test vector and a prior distribution. I'd like to drop rpy2 and move to scipy, but when I test it, rpy2 and scipy don't return the same result. Here's my sample code.

import numpy as np
from scipy import linalg
from scipy.spatial.distance import mahalanobis as mahalanobis
import rpy2.robjects as robjects

# The vector to test.
test_values = [692.5816522801106, 1421.4737901031651, 6.117859, 7.259449]
test_values_r = robjects.FloatVector(test_values)
test_values_np = np.array(test_values)

# The covariance matrix from the prior distribution
covs = [15762.87, 13486.23, 34.61164, 22.15451, 
        13486.23, 36003.67, 33.8431, 30.52712, 
        34.61164, 33.8431, 0.4143354, 0.1125765, 
        22.15451, 30.52712, 0.1125765, 0.2592451]
covs_np = np.reshape(np.array(covs), (4,-1))
covs_r  = robjects.r["matrix"](robjects.FloatVector(covs), nrow = 4)

# The means of the prior distribution
centers = [808.0645, 1449.711, 4.8443, 4.95776]
centers_np = np.array(centers)
centers_r  = robjects.FloatVector(centers)

r_dist = robjects.r["mahalanobis"](test_values_r, centers_r, covs_r)
# <FloatVector - Python:0x1052275a8 / R:0x10701bfa8>
# [29.782287]

np_dist = mahalanobis(test_values_np, centers_np, linalg.inv(covs_np))
# 5.4573150053873185

Am I missing something obvious?

役に立ちましたか?

解決

The R function returns the squared Mahalanobis distance (see here for example).

Thus:

>>> r_dist[0]
29.782287068025585
>>> np_dist
5.4573150053873185
>>> np_dist**2 - r_dist[0]
3.5527136788005009e-15
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top