Closest pair of points across a line

Question 1

Evgeny's answer works, but it's a lot of effort without library support: compute a full Voronoi diagram plus an additional sweep line algorithm. It's easier to enumerate for both sets of points the points whose Voronoi cells intersect the separating line, in order, and then test all pairs of points whose cells intersect via a linear-time merge step.

To compute the needed fragment of the Voronoi diagram, assume that the x-axis is the separating line. Sort the points in the set by x-coordinate, discarding points with larger y than some other point with equal x. Begin scanning the points in order of x-coordinate, pushing them onto a stack. Between pushes, if the stack has at least three points, say p, q, r, with r most recently pushed, test whether the line bisecting pq intersects the separating line after the line bisecting qr. If so, discard q, and repeat the test with the new top three. Crude ASCII art:

Case 1: retain q

------1-2-------------- separating line
     /  |
  p /   |
   \    |
    q-------r

Case 2: discard q

--2---1---------------- separating line
   \ /
  p X r
   \ /
    q

Question 2

For each point of one set find closest point in other set. While doing this, keep only one pair of points having minimal distance between them. This reduces given problem to other one: "Algorithm to find for all points in set A the nearest neighbor in set B", which could be solved using sweep line algorithm over (1) one set of points and (2) Voronoi diagram for other set.

Algorithm complexity is O((M+N) log M). And this algorithm does not use the fact that two sets of points are separated from each other by a line.

Question 3

well what about this:

determine on which side any point is:
- let P be your points (P0,...Pi,...Pn)
- let A,B be the separator line start-end points
- so: side(Pi) = signum of ((B-A).(Pi-A))
- this is based on simple fact that signum of scalar vector multiplication (dot product) depends on the order of points (see triangle/polygon winding rule for more info)
find minimal distance of any (Pi,Pj) where side(Pi)!=side(pj)
- so first compute all sides for all points O(N)
- then cycle through all Pi and inside that for
- cycle through all Pj and search for min distance.
- if the Pi and Pj groups aprox. equal size tahn it is O((N/2)^2)
you can further optimize the search by 'sort' the points Pi,Pj by 'distance' from AB
- you can use another dot product to do that, this time instead (B-A)
- use perpendicular vector to it let say (C-A)
- discard all points from Pi2 (and similar Pj2 also)
- where ((B-A).(P(i1)-A)) is close to ((B-A).(P(i2)-A))
- and |((C-A).(P(i1)-A))| << |((C-A).(P(i2)-A))|
- beacuese that means that Pi2 is behind Pi1 (farer from AB)
- and close to the normal of AB going near Pi1
- complexity after this optimization strongly depend on the dataset.
- should be O(N+(Ni*Nj)) where Ni/Nj is number of remaining points Pi/Pj
- you need 2N dot products, Ni*Nj distance comparision (do not neet to be sqrt-ed)

Question 4

A typical approach to this problem is a sweep-line algorithm. Suppose you have a coordinate system that contains all points and the line separating points from different sets. Now imagine a line perpendicular to the separating line hopping from point to point in ascending order. For convenience, you may rotate and translate the point set and the separating line such that the separating line equals the x-axis. The sweep-line is now parallel with the y-axis.

Hopping from point to point with the sweep-line, keep track of the shortest distance of two points from different sets. If the first few points are all from the same set, it's easy to find a formula that will tell you which one you'll have to remember until you hit the first point from the other set.

Suppose you have a total of N points. You will have to sort all points in O(N*log(N)). The sweep-line algorithm itself will run in O(N).

Question 5

(I'm not sure if this bears any similarity to David's idea...I only saw it now after I logged in to post my thoughts.) For the sake of the argument, let's say we transposed everything so the dividing line is the x axis and sorted our points by the x coordinate. Assuming N is not too large, if we scan along the x-axis (that is, traverse our sorted list of a's and b's), we can keep a record of the overall minimum and two lists of passed points. The current point in the scan is tested against each passed point from the other list while the distance from the point in the list to (x-coordinate of our scan,0) is greater than or equal to the overall min. In the example below, when reaching b2, we can stop testing at a2.

        scan ->                 
      Y                           
      |         a2         
      |                   
      |   a1           a3   
X--------------------------
      |     b1           b3
      |             b2