How can I select L neighbors

https://stackoverflow.com/questions/23325425

10-07-2023
|

Question

I'd like to know how to select the optimal L neighbors of a specific point. Like saying I need to select the 5 neighbors. Is there any parameter to change.

I want to let it select L points where : L = SQRT[number of points in data set]

I have a huge data set, so I might find a lot of points near to each others while others far from them.

L, the number of neighbors to consider, can be chosen arbitrarily, or with cross validation. With more training data, L can be larger, since the training data is more dense in the underlying space X. With more discontinuous or nonlinear dynamics in the classification, K should be smaller, to capture these more local fluctuations.

 NearestNeighbors(algorithm='auto', leaf_size=30, n_neighbors=5, p=2,
             radius=1.0, warn_on_equidistant=True)

Solution

I want to let it select L points where : L = SQRT[number of points in data set]

That's not possible unless you compute the number of samples and its square root yourself. You can only pass an integer as n_neighbors.

The only way to take a variable number of neighbors into account is to use RadiusNeighbors{Classifier,Regressor}, which take a distance cutoff instead of a k parameter.

OTHER TIPS

Please try the following example:

 import numpy as np
 rng = np.random.RandomState(42)
 from sklearn.neighbors import NearestNeighbors
 nnbrs = NearestNeighbors(n_neighbors=5)
 points = rng.randn(500, 3)
 nnbrs.fit(points)

 point_of_interest = np.array([0, 1, 0])
 distances, neighbor_indices = nnbrs.kneighbors(point_of_interest)
 neighbors = points[neighbor_indices]

Does this obtain the desired result? You should try this on your sparse matrix data and play with algorithm= (see docs), if there are calculation time / memory problems

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow