Cannot find paper: All k nearest neighbors search in N*log(N) using distance indices for log(N) support points
-
04-11-2019 - |
题
I remember reading a paper about finding k nearest neighbors for all N multi-dimensional objects in the set.
I've tries to find it again many times, but have failed so far.
The algorithm was as follows:
- Each object is assigned a sorted queue of nears neighbors (maximum size is k). Let's assume that $k < log(N)$
- Each time you calculate $distance(A, B)$ try to add A into B's neighbor list and vise versa.
- Choose $log(N)$ support objects (say, first $log(N)$ objects out of $N$). (Or maybe that was $sqrt(N)$)
- For each support, build the distance-based index containing all N objects
- After the index construction, every object has a candidate list of k nearest neighbors. All real nearest neighbors are not further than the furthest neighbor in the candidate list - candidate neighbor radius.
- For each point we use triangle inequality and current neighbor radius to constrain the distances between a candidate point and the support points: $|dist(candidate_i, support_j) - dist(point, support_j)| <= radius$. We efficiently extract the candidates that fall in radius-based ranges for all support points.
- We calculate the distances for those candidates and update the nearest-neighbor lists for both the current point and the candidate points.
- As we process more points, the neighbor lists become "tighter" - the radius, the distance to the furthest neighbor decreases with each update of the list.
- Tighter constraints mean that the candidate lists become smaller and smaller after each processed point and leading to total run time of $N*log(N)$.
So, why do I need the paper again? First of all, several people have asked me for it and I was surprised I cannot find it again. More importantly, it did not work well for me. This might be caused by a bad choice of the support objects. I remember nothing about the selection process.
没有正确的解决方案
不隶属于 cs.stackexchange