Question

I am attempting to create a classifier/predictor using SURF and a Naive Bayesian. I am pretty much following the technique from "Visual Categorization with Bags of Keypoints" by Dance, Csurka... I am using SURF instead of SIFT.

My results are pretty horrendous and I am not sure where my error lies. I am using 20 car samples (ham) and 20 motorcycle samples(spam) from the CalTec set. I suspect it is in the way I am creating my vocabulary. What I can see is that the EMGU/OpenCV kmeans2 classifier is returning different results given the same SURF descriptor input. That makes me suspicious. Here is my code so far.

public Matrix<float> Extract<TColor, TDepth>(Image<TColor, TDepth> image)
        where TColor : struct, Emgu.CV.IColor
        where TDepth : new()
    {            
        ImageFeature[] modelDescriptors;

        using (var imgGray = image.Convert<Gray, byte>())
        {
            var modelKeyPoints = surfCPU.DetectKeyPoints(imgGray, null);
            //the surf descriptor is a size 64 vector describing the intensity pattern surrounding
            //the corresponding modelKeyPoint
            modelDescriptors = surfCPU.ComputeDescriptors(imgGray, null, modelKeyPoints);
        }

        var samples = new Matrix<float>(modelDescriptors.Length, DESCRIPTOR_COUNT);//SURF Descriptors have 64 samples
        for (int k = 0; k < modelDescriptors.Length; k++)
        {
            for (int i = 0; i < modelDescriptors[k].Descriptor.Length; i++)
            {
                samples.Data[k, i] = modelDescriptors[k].Descriptor[i];
            }

        }

        //group descriptors into clusters using K-means to form the feature vectors
        //create "vocabulary" based on square-error partitioning K-means
        var centers = new Matrix<float>(CLUSTER_COUNT, samples.Cols, 1);
        var term = new MCvTermCriteria();
        var labelVector = new Matrix<int>(modelDescriptors.Length, 1);
        var cluster = CvInvoke.cvKMeans2(samples, CLUSTER_COUNT, labelVector, term, 3, IntPtr.Zero, 0, centers, IntPtr.Zero);

        //this is the quantized feature vector as described in Dance, Csurska Bag of Keypoints (2004)
        var keyPoints = new Matrix<float>(1, CLUSTER_COUNT);

        //quantize the vector into a feature vector
        //making a histogram of the result counts
        for (int i = 0; i < labelVector.Rows; i++)
        {
            var value = labelVector.Data[i, 0];
            keyPoints.Data[0, value]++;
        }
        //normalize the histogram since it will have different amounts of points
        keyPoints = keyPoints / keyPoints.Norm;
        return keyPoints;
    }

The output gets fed into NormalBayesClassifier. This is how I train it.

Parallel.For(0, hamCount, i =>
            {
                using (var img = new Image<Gray, byte>(_hams[i].FullName))
                {
                    var features = _extractor.Extract(img);
                    features.CopyTo(trainingData.GetRow(i));
                    trainingClass.Data[i, 0] = 1;
                }
            });

        Parallel.For(0, spamCount, j =>
        {
            using (var img = new Image<Gray, byte>(_spams[j].FullName))
            {
                var features = img.ClassifyFeatures(_extractor);
                features.CopyTo(trainingData.GetRow(j));
                trainingClass.Data[j + hamCount, 0] = 0;
            }
        });

        using (var classifier = new NormalBayesClassifier())
        {
            if (classifier.Train(trainingData, trainingClass, null, null, false))
            {

                classifier.Save(_statModelFilePath);
            }
        }

When I call Predict using the NormalBayesClassifier it returns 1(match) for all of the training samples...ham and spam.

Any help would be greatly appreciated.

Edit. One other note is that I have chosen CLUSTER_COUNT from 5 to 500 all with the same result.

Was it helpful?

Solution

The problem was more conceptual than technical. I did not understand that the K Means cluster was building the vocabulary for the "entire" data set. The way to do it correctly is to give the CvInvoke.cvKMeans2 call a training matrix containing all of the features for every image. I was building the vocabulary each time based on a single image.

My final solution involved pulling the SURF code into its own method and running that on each ham and spam image. I then used the massive result set to build a training matrix and gave that to the CvInvoke.cvKMeans2 method. It took quite a long time to finish the training. I have about 3000 images total.

My results were better. The prediction rate was 100% accurate with the training data. My problem now is that I am likely suffering from over fitting because its prediction rate is still poor for non-training data. I will play around with the hessian threshold in the SURF algorithm as well as the cluster count to see if I can minimize the over fitting.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top