Pergunta

I am hoping someone can explain me, if I'm on the right way. I'm trying to learn something about image retrieval and SVM but it's just a little bit confusing. I will ask my questions by posting the source code.

First I have a dataset of cats. For every "cat" picture I get the descriptor by using the sift algorithm (vlfeat). I stick all the descriptors (from every picture) together into one list and find out the clusters (of all descriptors) by using k-means (I choose k=3) trying it out and plotting the result.

Question 1: Is there a "terminal-way" to see if I choose a good k? Because plotting a 128 dimension descriptor set of 50 cat pictures takes a long time.

Question 2: I'm doing list.append(hstack((loc,des))) with the locations and the descriptors. Is this the right way or should I only take the descriptors?

def get_features(datas):
      list = []
      for data in datas:
        loc,des = vlfeat_module.vlf_create_desc(data,'tmp.sift')
        list.append(hstack((loc,des)))
      desc = numpy.vstack(list)
      center,_ = kmeans(desc, 3)
      return center

After getting the centers I make a *.sparse file of the 3 x 128 dimension descriptors that looks like this:

1  1:333.756498151 2:241.935029943...
1  1:806.715774779 2:1134.68287451...
....

After this procedure with the cat pictures I repeat this with "none-cat-pictures" and get a *.sparse file that looks like this:

0  1:101.905620535 2:250.9213760...
0  1:223.619957204 2:509.303625427...
...

I took both *.sparse files together and started training with SVM (I think that I started ^^)

X_train, y_train = load_svmlight_file("./svm_files/cats_nonecats.sparse")
clf = svm.NuSVC(gamma=0.07,verbose=True)
clf.fit(X_train,y_train)
pred = clf.predict(X_train)
accuracy_score(y_train, pred)

I get this result:

[LibSVM]*
optimization finished, #iter = 4
C = 2.000000
obj = 5.000000, rho = 0.000000
nSV = 10, nBSV = 0
Total nSV = 10
 NuSVC(cache_size=200, coef0=0.0, degree=3, gamma=0.07, kernel=rbf,
   max_iter=-1, nu=0.5, probability=False, shrinking=True, tol=0.001,
   verbose=True)
1.0

I don't think that this is right, so maybe someone could explain me my mistakes. Next question: is this the "training?" or did I repeat something e.g 10 times? Is it possible for the classifier now to recognise cats?

Thank you for some answers.

Greetings,

Linda

EDIT

well i'll try to explain what i did right now. I hope it's now correct.

1. split my data into test and training data
2. get all destrictors from training / test data
3. create centers with (k-means) from training data
4. get all histogram-vectors from descriptors of the training data
5. create a sparse file from the histogramm vector
6. feed this sparse file to the svm

some errors?

EDIT part II:

I've updated the number of pictures...but I have got some more questions. What did you mean with "np.bincount + divide by sum" ? If I have got a Histogram like that [120, 0 , 300, 80] then i have to divide this values by the sum of the descriptors for one picture? maybe so? [120/500, 0/500, 300/500. 80/500] ? And is there a good way to compute the k of k-means? because 100 between 500 maybe is the right k for cats but what if i wanted to learn my classifier to recognise dogs? The k will be another?!

Thank you

Foi útil?

Solução

Basically you did the right thing, with some little errors: First, for any machine learning approach, you should split your data into training and test before you do anything else. This is the only way to know whether you succeeded in building a cat classifier. You tested your approach on the training set an got perfect results - that tells you nothing.

Second, for the bag of word approach, you don't use the cluster centers as the descriptors for the images. For each image, you look up how often each cluster appears (by applying predict to all the descriptors in one image), and then build a histogram of that (i.e. np.bincount + divide by the sum). This gives you a descriptor of length n_clusters (three in your case) for each image. These you feed to the classifier.

Some less important remarks: Btw, depending on how varied your pictures are, and what the negatives look like, you probably have much to few examples. Try at least 50-100. Also, your number of centers is way to small. Using three, you get 3d vector describing this image. It is very unlikely that this still contains enough information to distinguish cats and non-cats. Try 100-500.

I should really write a blog post of how to do this... Hopefully soon.... Btw, if your cat pictures just contain one centered cat and not much else, you might want to try hog instead of sift.

Cheers, Andy

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top