Question

Scipy (http://www.scipy.org/) offers two KD Tree classes; the KDTree and the cKDTree.

The cKDTree is much faster, but is less customizable and query-able than the KDTree (as far as I can tell from the docs).

Here is my problem: I have a list of 3million 2 dimensional (X,Y) points. I need to return all of the points within a distance of X units from every point.

With the KDtree, there is an option to do just this: KDtree.query_ball_tree() It generates a list of lists of all the points within X units from every other point. HOWEVER: this list is enormous and quickly fills up my virtual memory (about 744 million items long).

Potential solution #1: Is there a way to parse this list into a text file as it is writing?

Potential solution #2: I have tried using a for loop (for every point in the list) and then finding that single point's neighbors within X units by employing: KDtree.query_ball_point(). HOWEVER: this takes forever as it needs to run the query millions of times. Is there a cKDTree equivalent of this KDTree tool?

Potential solution #3: Beats me, anyone else have any ideas?

Was it helpful?

Solution

From scipy 0.12 on, both KD Tree classes have feature parity. Quoting its announcement:

cKDTree feature-complete

Cython version of KDTree, cKDTree, is now feature-complete. Most operations (construction, query, query_ball_point, query_pairs, count_neighbors and sparse_distance_matrix) are between 200 and 1000 times faster in cKDTree than in KDTree. With very minor caveats, cKDTree has exactly the same interface as KDTree, and can be used as a drop-in replacement.

OTHER TIPS

Try using KDTree.query_ball_point instead. It takes a single point, or array of points, and produces the points within a given distance of the input point(s).

You can use this function to perform batch queries. Give it, say, 100000 points at a time, and then write the results out to a file. Something like this:

BATCH_SIZE = 100000
for i in xrange(0, len(pts), BATCH_SIZE):
    neighbours = tree.query_ball_point(pts[i:i+BATCH_SIZE], X)
    # write neighbours to a file...
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top