Question

I am trying to obtain feature vectors for N =~ 1300 images in my data set, one of the features I have to implement is shape. So I plan to use SIFT descriptors. However, each image returns different number of keypoints, so I run

[F,D] = vl_sift(image);

F is of size 4 x N and D is of size 128 x N where N is the number of keypoints detected.

However, I want to obtain a single vector of size 128 x 1 that can represent an image as good as possible. I have seen things like clustering and k-means, but I don't have any idea how to do them.

The most basic idea is to get the average of these N vectors of size 128x1, then I have a feature vector. But is taking the average meaningful? Should I do some kind of histogram?

Any help will be appreciated. Thanks !

Was it helpful?

Solution

This is actually a big research problem. You are correct, averaging all the descriptors will not be meaningful. There are several approaches out there for creating a single vector out of a set of local descriptors. One big class of methods is called "bag of features" or "bag of visual words". The general idea is to cluster local descriptors (e. g. sift) from many images (e. g. using k-means). Then you take a particular image, figure out which cluster each descriptor from that image belongs to, and create a histogram. There are different ways of doing the clustering and different ways of creating and normalizing the histogram.

A somewhat different approach is called "Pyramid Match Kernel", which is a way of training an SVM classifier on sets of local descriptors.

So for starters google "bag of features" or "bag of visual words".

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top