Generate data from kmean's clusters

https://stackoverflow.com/questions/22945201

30-06-2023
|

Вопрос

So I have an input vector, A which is a row vector with 3,000 data points. Using MATLAB, I found 3 cluster centres for A.

Now that I have the 3 cluster centres, I have another row Vector B with 3000 points. The elements of B have one of three values: 1, 2 or 3. So say for e.g if the first 5 elements of B are

B(1,1:5) = [ 1 , 3, 3, 2, 1]

This means that B(1,1) belongs to cluster 1, B(1,2) belongs to cluster 3 etc. What I am trying to do is for every data point in the row vector B, I look at what cluster it belongs to by reading its value and then replace it with a data value from that cluster.

So after the above is done, the first 5 elements of B would look like:

B(1,1:5) = [ 2.7 , 78.4, 55.3, 19, 0.3]

Meaning that B(1,1) is a data value picked from the first cluster (that we got from A), B(1,2) is a data value picked from the third cluster (that we got from A) etc.

Решение

k-means only keeps means, it does not model the data distribution.

You cannot generate artificial data sensibly from k-means clusters without additional statistics and distribution assumptions.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow