Question

In an python application I'm developing I have an array of 3D points (of size between 2 and 100000) and I have to find the points that are within a certain distance from each other (say between two values, like 0.1 and 0.2). I need this for a graphic application and this search should be very fast (~1/10 of a second for a sample of 10000 points)

As a first experiment I tried to use the scipy.spatial.KDTree.query_pairs implementation, and with a sample of 5000 point it takes 5 second to return the indices. Do you know any approach that may work for this specific case?

A bit more about the application:

The points represents atom coordinates and the distance search is useful to determine the bonds between atoms. Bonds are not necessarily fixed but may change at each step, such as in the case of hydrogen bonds.

Was it helpful?

Solution

Great question! Here is my suggestion:

Divide each coordinate by your "epsilon" value of 0.1/0.2/whatever and round the result to an integer. This creates a "quotient space" of points where distance no longer needs to be determined using the distance formula, but simply by comparing the integer coordinates of each point. If all coordinates are the same, then the original points were within approximately the square root of three times epsilon from each other (for example). This process is O(n) and should take 0.001 seconds or less.

(Note: you would want to augment the original point with the three additional integers that result from this division and rounding, so that you don't lose the exact coordinates.)

Sort the points in numeric order using dictionary-style rules and considering the three integers in the coordinates as letters in words. This process is O(n * log(n)) and should take certainly less than your 1/10th of a second requirement.

Now you simply proceed through this sorted list and compare each point's integer coordinates with the previous and following points. If all coordinates match, then both of the matching points can be moved into your "keep" list of points, and all the others can be marked as "throw away." This is an O(n) process which should take very little time.

The result will be a subset of all the original points, which contains only those points that could be possibly involved in any bond, with a bond being defined as approximately epsilon or less apart from some other point in your original set.

This process is not mathematically exact, but I think it is definitely fast and suited for your purpose.

OTHER TIPS

The first thing that comes to my mind is: If we calculate the distance between each two atoms in the set it will be O(N^2) operations. It is very slow. What about to introduce the statical orthogonal grid with some cells size (for example close to the distance you are interested) and then determine the atoms belonging to the each cell of the grid (it takes O(N) operations) After this procedure you can reduce the time for searching of the neighbors.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top