Question

I'm building a K-nearest neighbors classifier, and I'd like to get my distance calculations done all at once (it would help too, as the unvectorized version is taking a loong time to run).

I have a test dataset of size 28000 examples x 784 features, and I have a training dataset of size 42000 examples x 784 features. The code that answers my question should result in a matrix of size 28000 x 42000, where every row contains the distance from that test example to each of the 42000 training examples.

The best I've come up with is using sum and bsxfun to compute all the distances at once for each test example, but I still need to loop through all 28000 examples, and as I said earlier it's taking awhile.

Was it helpful?

Solution

pdist2(A, B) will do precisely what you need, where A and B are your training and your test dataset, respectively. Here is the reference: http://www.mathworks.com/help/stats/pdist2.html

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top