Train object detection without annotated data/bounding boxes

https://datascience.stackexchange.com/questions/19370

22-10-2019
|

Question

From what I can see most object detection NNs (Fast(er) R-CNN, YOLO etc) are trained on data including bounding boxes indicating where in the picture the objects are localised.

Are there algos that simply take the full picture + label annotations, and then on top of determining whether an image contain certain object(s) also indirectly

learn to understand the appropriate bounding box(es) for objects?

Solution

Yes, there are models that do this. This link points to one of the first papers I believe. The main idea is called weakly supervised object detection.

The paper essentially makes three modifications.

They treat the typical hidden fully connected layer as a convolutional layer. This works because convolutional layers can be thought of as convolving the same fully connected network about the image.
They add a global max pooling later at the end of this convolutional layer. This is the operator that will "highlight" the area of this final conv layer that has learned the pattern of objects it is trying to classify. Using a threshold on the weights of this global max will ensure a region is significant. Then, they use an algorithm to create a bounding box from this region.
They suggest a new loss function that lends itself to an object existing or not. I think they assume a Bernoulli for each class which lends itself to multiple logistic regression instead of softmax.

Take a look because it's pretty sweet and has been cited by a lot of other new exciting papers.

OTHER TIPS

Another approach is "Training object class detectors using only human verification"

We propose a new scheme for training object detectors which only requires annotators to verify bounding-boxes produced automatically by the learning algorithm. Our scheme iterates between re-training the detector, re-localizing objects in the training images, and human verification. We use the verification signal both to improve re-training and to reduce the search space for re-localisation, which makes these steps different to what is normally done in a weakly supervised setting

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange