Classic kNN data structures such as the KD tree used in sklearn
become very slow when the dimension of the data increases. For very high-dimensional problems it is advisable to switch algorithm class and use approximate nearest neighbour (ANN) methods, which sklearn
seems to be lacking, unfortunately. See links below for papers on algorithms and theory why approximate nearest neighbors is so much faster in these cases.
A well-known ANN library in the C++ world, widely used in Computer Vision for nearest neighbors in feature descriptor spaces, is
FLANN
. The homepage says it contains Python bindings (I have never worked with then).Another popular alternative is the
ANN
library with Python wrapper here, although the newer FLANN seems to be more popular at the moment.See also this answer (but some links are dead).
One caveat: Your data seems to be very high dimensional - I don't known how these libraries perform for you. They should still beat sklearn
.