Let n
be (count a)
and m
be (count b)
. Then, if a
and b
are both ordered, then this can be done in what I believe ought to be O(n log(log m))
time, in other words, very close to linear in n
.
First, let's re-implement abs
and a binary-search (improvements here) to be independent of host (leveraging a native, e.g. Java's, version ought to be significantly faster)
(defn abs [x]
(if (neg? x) (- 0 x) x))
(defn binary-search-nearest-index [v x]
(if (> (count v) 1)
(loop [lo 0 hi (dec (count v))]
(if (== hi (inc lo))
(min-key #(abs (- x (v %))) lo hi)
(let [m (quot (+ hi lo) 2)]
(case (compare (v m) x)
1 (recur lo m)
-1 (recur m hi)
0 m))))
0))
If b
is sorted, a binary search in b
takes log m
steps. So, mapping this over a
is a O(n log m)
solution, which for the pragmatist is likely good enough.
(defn nearest* [a b]
(map #(b (binary-search-nearest-index b %)) a))
However, we can also use the fact that a
is sorted to divide and conquer a
.
(defn nearest [a b]
(if (< (count a) 3)
(nearest* a b)
(let [m (quot (count a) 2)
i (binary-search-nearest-index b (a m))]
(lazy-cat
(nearest (subvec a 0 m) (subvec b 0 (inc i)))
[(b i)]
(nearest (subvec a (inc m)) (subvec b i))))))
I believe this ought to be O(n log(log m))
. We start with the median of a
and find nearest in b
in log m
time. Then we recurse on each half of a
with split portions of b
. If a m
-proportional factor of b are split each time, you have O(n log log m). If only a constant portion is split off then the half of a
working on that portion is linear time. If that continues (iterative halves of a work on constant size portions of b
) then you have O(n)
.