在两个不同的阵列numpy的点之间的最小欧几里得距离，而不是内

https://stackoverflow.com/questions/1871536

18-09-2019
|

题

我有两个阵列的 X - ý坐标，我想找到在一个阵列中的每个点之间的最小欧几里得距离与所有的所述另一阵列中的点。该阵列是不一定相同的尺寸。例如：

xy1=numpy.array(
[[  243,  3173],
[  525,  2997]])

xy2=numpy.array(
[[ 682, 2644],
[ 277, 2651],
[ 396, 2640]])

我的当前方法遍历在xy每个坐标xy1并计算坐标之间，而另一个坐标的距离。

mindist=numpy.zeros(len(xy1))
minid=numpy.zeros(len(xy1))

for i,xy in enumerate(xy1):
    dists=numpy.sqrt(numpy.sum((xy-xy2)**2,axis=1))
    mindist[i],minid[i]=dists.min(),dists.argmin()

是否有消除for循环并以某种方式做元件逐元件的两个阵列之间的计算的方法吗？我设想生成距离矩阵针对我能找到每行或列中的最小元素。

另一种方式来看待这个问题。说我串联xy1（长度米）和xy2（长度 P 的）转换成xy（长度名词的）和I存储原始的长度阵列。从理论上说，我应该然后能够产生从这些坐标的 N×N的距离矩阵从中我可以抓住一个米×P 的子矩阵。有一种方法，以有效地产生这个子矩阵

解决方案

（数月后） scipy.spatial.distance.cdist( X, Y ) 让所有对距离的， X和Y 2朦胧，朦胧3 ...点击它也做22个不同的规范，详细此处。

# cdist example: (nx,dim) (ny,dim) -> (nx,ny)

from __future__ import division
import sys
import numpy as np
from scipy.spatial.distance import cdist

#...............................................................................
dim = 10
nx = 1000
ny = 100
metric = "euclidean"
seed = 1

    # change these params in sh or ipython: run this.py dim=3 ...
for arg in sys.argv[1:]:
    exec( arg )
np.random.seed(seed)
np.set_printoptions( 2, threshold=100, edgeitems=10, suppress=True )

title = "%s  dim %d  nx %d  ny %d  metric %s" % (
        __file__, dim, nx, ny, metric )
print "\n", title

#...............................................................................
X = np.random.uniform( 0, 1, size=(nx,dim) )
Y = np.random.uniform( 0, 1, size=(ny,dim) )
dist = cdist( X, Y, metric=metric )  # -> (nx, ny) distances
#...............................................................................

print "scipy.spatial.distance.cdist: X %s Y %s -> %s" % (
        X.shape, Y.shape, dist.shape )
print "dist average %.3g +- %.2g" % (dist.mean(), dist.std())
print "check: dist[0,3] %.3g == cdist( [X[0]], [Y[3]] ) %.3g" % (
        dist[0,3], cdist( [X[0]], [Y[3]] ))


# (trivia: how do pairwise distances between uniform-random points in the unit cube
# depend on the metric ? With the right scaling, not much at all:
# L1 / dim      ~ .33 +- .2/sqrt dim
# L2 / sqrt dim ~ .4 +- .2/sqrt dim
# Lmax / 2      ~ .4 +- .2/sqrt dim

其他提示

要通过计算距离的P矩阵的米，这应该工作：

>>> def distances(xy1, xy2):
...   d0 = numpy.subtract.outer(xy1[:,0], xy2[:,0])
...   d1 = numpy.subtract.outer(xy1[:,1], xy2[:,1])
...   return numpy.hypot(d0, d1)

的

.outer呼叫使两个这样的矩阵（沿两个轴的标量的差异），则.hypot呼叫接通那些成相同形状的矩阵（标量欧几里德距离的）。

有关你想要做什么：

dists = numpy.sqrt((xy1[:, 0, numpy.newaxis] - xy2[:, 0])**2 + (xy1[:, 1, numpy.newaxis - xy2[:, 1])**2)
mindist = numpy.min(dists, axis=1)
minid = numpy.argmin(dists, axis=1)

修改的：与其说sqrt，做广场等，你可以使用numpy.hypot：

dists = numpy.hypot(xy1[:, 0, numpy.newaxis]-xy2[:, 0], xy1[:, 1, numpy.newaxis]-xy2[:, 1])

接受的答案没有完全解决的问题，其请求找到的最小的两组点之间距离，而不是在两个的每个点之间的距离集。

Altough一个直接的解决方案，以原来的问题确实由计算的每个对和susequently找到最小的一个之间的距离的，这一点，如果一个仅在最小感兴趣的是没有必要的距离。更快的解决方案存在后一问题。

所有提出的解决方案，它可以扩展为m*p = len(xy1)*len(xy2)运行时间。这对于小的数据集确定，但最佳的解决方案可以写成如鳞m*log(p)，产生巨大的节省大xy2数据集。

可使用来实现此最佳执行时间缩放 SciPy的.spatial.cKDTree 如下

import numpy as np from scipy import spatial xy1 = np.array( [[243, 3173], [525, 2997]]) xy2 = np.array( [[682, 2644], [277, 2651], [396, 2640]]) # This solution is optimal when xy2 is very large tree = spatial.cKDTree(xy2) mindist, minid = tree.query(xy1) print(mindist) # This solution by @denis is OK for small xy2 mindist = np.min(spatial.distance.cdist(xy1, xy2), axis=1) print(mindist)

其中mindist是xy1每个点和在xy2点的集合之间的最小距离

import numpy as np P = np.add.outer(np.sum(xy1**2, axis=1), np.sum(xy2**2, axis=1)) N = np.dot(xy1, xy2.T) dists = np.sqrt(P - 2*N)

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow