Bag of Visual Words in Opencv

https://stackoverflow.com/questions/19562484

01-07-2022
|

Question

I am using BOW in opencv for clustering the features of variable size. However one thing is not clear from the documentation of the opencv and also i am unable to find the reason for this question:

assume: dictionary size = 100.

I use surf to compute the features, and each image has variable size descriptors e.g.: 128 x 34, 128 x 63, etc. Now in BOW each of them are clustered and I get a fixed descriptor size of 128 x 100 for a image. I know 100 is the cluster center created using kmeans clustering.

But I am confused in that, if image has 128 x 63 descriptors, than how come it clusters into 100 clusters which is impossible using kmeans UNLESS i convert the descriptor matrix to 1D. Wont converting to 1D will lose valid 128 dimensional information of a single key points?

I need to know how is the descriptor matrix manipulated to get 100 cluter centers from only 63 features.

Solution

Think it like this.

You have 10 cluster means total and 6 features for current image. First 3 of those features are closest to 5th mean and remaining 3 of them are closest to 7th, 8th, and 9th mean respectively. Then your feature will be like [0, 0, 0, 0, 3, 0, 1, 1, 1, 0] or normalized version of this. Which is 10 dimensional, and that is equal to number of cluster mean. So you can create 100000 dimensional vector from 63 features if you want.

But still I think there is something wrong, because after you applied BOW, your features should be 1x100 not 128x100. Your cluster means are 128x1 and you are assigning your 128x1 sized features (you hvae 34 128x1 feature for first image, 63 128x1 feature for second image, etc.) to those means. So in basic you are assigning 34 or 63 features to 100 means, your result should be 1x100.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow