Pregunta

I am just curious. I am new here so be please considerate to my somewhat noobish question.

Let's say I am doing android application with image recognition, where all processes even computationally intensive have to happen on the mobile device cpu.

I am at the stage where I have already processed images extracted some features from the image. The set of images comes from only one building where it should recognize particular objects of interest (different windows, pictures, artefacts, outside of building). So it's a closed domain and I can provide enough pictures of objects from different angles. I plan to train Neural Network and provide it to app instead of image matching algorithm.

My idea is to extract keypoints, compute descriptors (using FREAK for keypoints ORB for descriptors) and out of those descriptors I would like to end up with single file or array which would end up something like this

    Desc1  Desc2 Desc3 Desc4 DescN......... Class
_________________________________________________________________________________
Picture 1     0.121  0.923 0.553 0.22  0.28           "object1" 
Picture 2     0.22    0.53  0.54 0.55  0.32 .........."object1" (different scale, angle)
Picture 3     ....    ...    ...   ...  ..   .........."object2"
Picture N
Picture N+1

so I can give it to neural network for training, however I got stuck as I have no idea how is the binary feature/descriptor represented in the Matrice(Class Mat - openCV). and how would I go on normalising these binary descriptors, so I can feed it to Neural Net(Multi-Layer Perceptron) for training. (even pseudo-code would help greatly)

¿Fue útil?

Solución

I can not give a complete answer to your question, because I'm not familiar with Neuronal Networks, but I can give you some ideas about ORB descriptors binary representation.

  1. When you are detecting keypoints you can't do it with FREAK. But as FREAK paper describes you should detect keypoints with FAST corner detector and then describe it with FREAK. If you want to recognize objects by ORB descriptors, you should use ORB for both, for keypoint detection and for description. Note that ORB keypoint detection is also can be based on FAST. You can change it by altering parameter of scoreType from OpenCV documentation. As you using android, you can set this parameter as described here

  2. About binary string descriptors. I also needed them to implement descriptor matcher with MySQL query. As Mat in OpenCV-java has only double descriptor representation, I've implemented the method to transform them to binary. For this purpose, the Mat of descriptors should be transformed to List<Double>. And you can use my function to obtain binary representation of descriptors. The function will return the List<String>.

Here is the code:

public static List<String> descriptorToBinary(List<Double> desc){

    List<String> binary_desc = new ArrayList<String>();

    String desc_bin= "";
    for(int i = 0; i < desc.size(); i++){

        String binary_str_tmp = Integer.toBinaryString((int)((double)desc.get(i)));
        if (binary_str_tmp.length() < 16)
        {
            int number_of_zeros = 16 - binary_str_tmp.length();
            String str_tmp = "";
            for(int t = 0; t < number_of_zeros; t++){
                str_tmp += "0";
            }
            binary_str_tmp = str_tmp + binary_str_tmp;
        }

        desc_bin+= binary_str_tmp;
        binary_desc.add(final_binary_str);

    }

    return binary_desc;

}

The returned list of strings will have the same size as list of MatOfKeyPoint if you will transform it to List<KeyPoint>

So how did I verified if these descriptors are correct:

  1. I've matched original Mat descriptors with Bruteforce Hamming matcher as was said in ORB paper
  2. I've registered the distances returned by matcher.
  3. Then I've calculated distances distances between String descriptors of the same image.
  4. Verified if opencv's Hamming distances were the same as distances between String descriptors. They were the same, so conversion from Mat to List was well performed.

So binary descriptors associated to keypoints will look like this:

Picture 1: object1
  keypoint1 : 512bit binary descriptor (1s and 0s)
  keypoint2 : 512bit binary descriptor
  keypoint3 : 512bit binary descriptor
  ...
Picture 2: object2
  keypoint1 : 512bit binary descriptor
  keypoint2 : 512bit binary descriptor
  keypoint3 : 512bit binary descriptor
  ...

Now about Multi-Layer Perceptron. I can not help you with it. That is why I've told at the start that my answer is incomplete. But I hope the comments that I've given will help you in future to sole your problem.

Otros consejos

Instead of trying to implement a classifier from scratch. Did you have consider HaarTraining?. You can train it to detect several objects in an image.

The training process is long, though.

http://note.sonots.com/SciSoftware/haartraining.html

Hope it helps!

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top