Closest pair of points algorithm variation

https://stackoverflow.com/questions/15835961

01-04-2022
|

Domanda

I know this may be a duplicate, but it seems like a variation on the 'Closest pair of Points' algorithm.

Given a Set of N points (x, y) in the unit square and a distance d, find all pair of points such that the distance between them is at most d.

For large N the brute force method is not an option. Besides the 'sweep line' and 'divide and conquer' methods, is there a simpler solution? These pair of points are the edges of an undirected graph, that i need to traverse it and say if it's connected or not (which i already did using DFS, but when N = 1 million it never finishes!).

Any pseudocode, comments or ideas are welcome, Thanks!

EDIT: I found this on Sedgewick book (i'm looking at the code right now):

Program 3.18 uses a two-dimensional array of linked lists to improve the running time of Program 3.7 by a factor of about 1/d2 when N is sufficiently large. It divides the unit square up into a grid of equal-sized smaller squares. Then, for each square, it builds a linked list of all the points that fall into that square. The two-dimensional array provides the capability to access immediately the set of points close to a given point; the linked lists provide the flexibility to store the points where they may fall without our having to know ahead of time how many points fall into each grid square.

Soluzione

We are really looking for points that are inside of a circle of center (x,y) and radius d.

The square that encloses circle is a square of center (x,y) and sides 2d. Any point out of this square does not need to be checked, it's out. So, a point a (xa, ya) is out if abs(xa - x) > d or abs (ya -yb) > d.

Same for the square that is enclosed by that circle is a square of center (x,y) and diagonals 2d. Any point out of this square does not need to be checked, it's in. So, a point a (xa, ya) is in if abs(xa - x) < (d * 1.412) or abs(ya -yb) < (d * 1.412).

Those two easy rules combined reduce a lot the number of points to be checked. If we sort the point by their x, filter the possible points, sort by their y, we come to the ones we really need to calculate the full distance.

Altri suggerimenti

For any given point, you can use a Manhattan distance (x-delta plus y-delta) heuristic to filter out most of the points that are not within the distance "d" - filter out any points whose Manhattan distance is greater than (sqrt(2) * d), then run the expensive-and-precise distance test on the remaining points.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow