Question

Problem. Given $X$ a finite metric space of cardinality $n$, construct a data structure in subquadratic time such that the query distance to the kth nearest neighbor of x can be resolved in sublinear time (independent of $k$) for arbitrary $k \leq n$ and $x \in X$.

What I have tried. I am aware of the existence of kdtrees, ball-trees, and cover trees. Under some assumptions (which I'm willing to make), I know that these structures can resolve the query distance to the nearest neighbor of x in sublinear time (often $O(\log(n))$), but I haven't been able to find similar results for the $k$th nearest neighbor for arbitrary $k$.

It seems that, often, one is interested in $k$ values that are small compared to $n$, and that, in those cases, the algorithms mentioned in the previous paragraph can be adapted at the cost of a multiplicative constant of the order of $k$. My problem is that I am interested in $k$ values that are potentially of the order of $n$.

Was it helpful?

Solution

For a general metric, you can't. For instance, one can prove it takes at least quadratic time to answer $\Theta(n)$ queries, in the worst case.

For instance, suppose we have $n/2$ query points that are all very far from each other, and $n/2$ target points. We'll ask for the $n/4$-th closest neighbor of each query point. Then each $d(q,t)$ can be set to any distance in $[1/2,1]$ for each of the $n^2/4$ pairs of a query point $q$ and a target point $t$, and each can be chosen independently, and you'll have a valid metric. Thus you can imagine the distance being specified by a $n/2 \times n/2$ matrix of distances $d(q,t)$. In $o(n^2)$ time, you can only read $o(n^2)$ of the entries in that matrix, which isn't sufficient to answer all of the queries. You can prove this with an adversarial argument: consider any $q$ where the algorithm has read $d(q,t)$ for fewer than $n/4$ different values of $t$; then whatever the algorithm outputs as the $n/4$-th nearest neighbor to $q$, we can arrange the other unread distances so that the algorithm's output was incorrect.

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top