Approximated closest pair algorithm

Question 1

I suggest using ideas that are derived from quickly solving k-nearest-neighbor searches.

The M-Tree data structure: (see http://en.wikipedia.org/wiki/M-tree and http://www.vldb.org/conf/1997/P426.PDF ) is designed to reduce the number distance comparisons that need to be performed to find "nearest neighbors".

Personally, I could not find an implementation of an M-Tree online that I was satisfied with (see my closed thread Looking for a mature M-Tree implementation) so I rolled my own.

My implementation is here: https://github.com/jon1van/MTreeMapRepo

Basically, this is binary tree in which each leaf node contains a HashMap of Keys that are "close" in some metric space you define.

I suggest using my code (or the idea behind it) to implement a solution in which you:

Search each leaf node's HashMap and find the closest pair of Keys within that small subset.
Return the closest pair of Keys when considering only the "winner" of each HashMap.

This style of solution would be a "divide and conquer" approach the returns an approximate solution.

You should know this code has an adjustable parameter the governs the maximum number of Keys that can be placed in an individual HashMap. Reducing this parameter will increase the speed of your search, but it will increase the probability that the correct solution won't be found because one Key is in HashMap A while the second Key is in HashMap B.

Also, each HashMap is associated a "radius". Depending on how accurate you want your result you maybe able to just search the HashMap with the largest hashMap.size()/radius (because this HashMap contains the highest density of points, thus it is a good search candidate) Good Luck

Question 2

If you only have sample distances, not original point locations in a plane you can operate on, then I suspect you are bounded at O(E). Specifically, it would seem from your description that any valid solution would need to inspect every edge in order to rule out it having something interesting to say, meanwhile, inspecting every edge and taking the smallest solves the problem.

Planar versions bypass O(V^2), by using planar distances to deduce limitations on sets of edges, allowing us to avoid needing to look at most of the edge weights.

Question 3

Use same idea as in space partitioning. Recursively split given set of points by choosing two points and dividing set in two parts, points that are closer to first point and points that are closer to second point. That is same as splitting points by a line passing between two chosen points.

That produces (binary) space partitioning, on which standard nearest neighbour search algorithms can be used.