I need to calculate spatial three-point correlation function P(r1, r2) for a set of points in 2D space. Right now I just go through all triplets of points and calculate distances r1 and r2 between pairs of points within a triplet and then plot 2D histogram which gives me desired three-point correlation. This however takes a lot of time even for moderate number of points.

The question is whether it is possible to speed up this calculation?

r1 = []
r2 = []
points = [[numpy.random.random(1), numpy.random.random(1)] for i in range(100)]
for x in points:
    for y in points:
        for z in points:                
            r1.append(scipy.spatial.distance.euclidean(x, y))
            r2.append(scipy.spatial.distance.euclidean(x, z))

pylab.figure()
n, xedges, yedges = numpy.histogram2d(r1, r2, bins = 10)
pylab.imshow(n, interpolation='nearest', origin='low',
             extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]])
有帮助吗?

解决方案

A few things that will make this faster:

First, with your array of points, instead of nested arrays of length 1 ndarrays, you can just make an Nx2 ndarray:

points = np.random.random((N, 2))

Next, you end up computing each pairwise distance many times, you should compute each distance once and then loop over the elements of the array. Scipy can do this calculation for you using scipy.spatial.distance.pdist. To recover a pairwise distance matrix, you must use scipy.spatial.distance.squareform. Explicitly looping over the elements of the matrix:

r1 = []
r2 = []
d = squareform(pdist(points))
for i in range(N):
    for j in range(N):
        for k in range(N):
            r1.append(d[i,j])
            r2.append(d[i,k])

Finally, you can make it simpler still by noting the relationship between the pairwise distance matrix d and the outputs r1 and r2 (which you can do by inspection for some small N, like 3). Using some array manipulation, you can recover the same arrays using:

d = squareform(pdist(points))
r1 = np.tile(d.reshape((N**2, 1)), N).flatten()
r2 = np.tile(d, N).flatten()

Using only a moderate N=20, on my slow computer, while your code runs in 1.15 s, the method of explicitly looping takes 12.7 ms, and the array manipulation shortcut takes 362 µs, and the latter two should scale better than the original code.

其他提示

for x in points:
    for y in points:
        r1.append(scipy.spatial.distance.euclidean(x, y))
    for z in points:
        r2.append(scipy.spatial.distance.euclidean(x, z))

will reduce complexity from cubic to quadratic. I can't advice more since I don't know the problem area.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top