The core of this problem is to compute a distance matrix D
of size m x 3
that contains the pairwise distances between all data points in X
and all data points in C
. The Euclidean distance between the i-th vector x_i
in X
and the j-th vector c_j
in C
can be rewritten as:
|x_i-c_j|^2 = |x_i|^2 - 2<x_i, c_j> + |c_j|^2
where <,> refers to inner product. The right-hand side of this equation can be easily vectorized, because the inner product of all pairs is just X * C'
which is BLAS3 operation. This way of computing the distance matrix is known as dist2
function in the book Pattern Recognition and Machine Learning by Christopher Bishop. I copy the function below with a little modification.
function D = dist2(X, C)
tempx = full(sum(X.^2, 2));
tempc = full(sum(C.^2, 2).');
D = -2*(X * C.');
D = bsxfun(@plus, D, tempx);
D = bsxfun(@plus, D, tempc);
The full
here is used in case X
or C
is a sparse matrix.
Note: The distance matrix D
computed this way might have tiny negative entries due to numerical rounding error. To guard against this case, use
D = max(D, 0);
The indices of the closest vector in C
can be retrieved from D
:
[~, idx] = min(D, [], 2);