K nearest neighbour search with weights on dimensions

https://stackoverflow.com/questions/20738429

20-09-2022
|

Question

I have a floor on which various sensors are placed at different location on the floor. For every transmitting device, sensors may detect its readings. It is possible to have 6-7 sensors on a floor, and it is possible that a particular reading may not be detected by some sensors, but are detected by some other sensors.

For every reading I get, I would like to identify the location of that reading on the floor. We divide floor logically into TILEs (5x5 feet area) and find what ideally the reading at each TILE should be as detected by each sensor device (based on some transmission pathloss equation).

I am using the precomputed readings from 'N' sensor device at each TILE as a point in N-dimensional space. When I get a real life reading, I find the nearest neighbours of this reading, and assign this reading to that location.

I would like to know if there is a variant of K nearest neighbours, where a dimension could be REMOVED from consideration. This will especially be useful, when a particular sensor is not reporting any reading. I understand that putting weightage on a dimension will be impossible with algorithms like kd-tree or R trees. However, I would like to know if it would be possible to discard a dimension when computing nearest neighbours. Is there any such algorithm?

EDIT:

What I want to know is if the same R/kd tree could be used for k nearest search with different queries, where each query has different dimension weightage? I don't want to construct another kd-tree for every different weightage on dimensions.

EDIT 2:

Is there any library in python, which allows you to specify the custom distance function, and search for k nearest neighbours? Essentially, I would want to use different custom distance functions for different queries.

Solution 2

After going through a lot of questions on stackoverflow, and finally going into details of scipy kd tree source code, I realised the answer by "celion" in following link is correct:

KD-Trees and missing values (vector comparison)

Excerpt:
"I think the best solution involves getting your hands dirty in the code that you're working with. Presumably the nearest-neighbor search computes the distance between the point in the tree leaf and the query vector; you should be able to modify this to handle the case where the point and the query vector are different sizes. E.g. if the points in the tree are given in 3D, but your query vector is only length 2, then the "distance" between the point (p0, p1, p2) and the query vector (x0, x1) would be

sqrt( (p0-x0)^2 + (p1-x1)^2 )

I didn't dig into the java code that you linked to, but I can try to find exactly where the change would need to go if you need help.

-Chris

PS - you might not need the sqrt in the equation above, since distance squared is usually equivalent."

OTHER TIPS

Both for R-trees and kd-trees, using weighted Minkowski norms is straightforward. Just put the weights into your distance equations!

Putting weights into Eulidean point-to-rectangle minimum distance is trivial, just look at the regular formula and plug in the weight as desired.

Distances are not used at tree construction time, so you can vary the weights as desired at query time.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow