Viola Jones Experiments (training sets)

https://stackoverflow.com/questions/13738486

05-12-2021
|

Question

It is said "4916 positive training examples were hand picked aligned, normalized, and scaled to a base resolution of 24x24. 10,000 negative examples were selected by randomly picking sub-windows from 9500 images which did not contain faces." In the paper "Robust Real-Time Face Detection by Paul Viola & Michael Jones"

My question is what do they mean about hand picked aligned, normalized, and scaled to a base resolution of 24x24?

Does "hand picked aligned" mean they have 4916 positive images of 4916 different faces? Does "normalized" mean each of the 4916 images have the same features[file size, file type, picture color(gray scale/colored)]? Does "scaled to a base resolution of 24x24" mean each of the 4916 images are re-sized to 24x24 pixels?

Thanks for your time!

Solution

Does "hand picked aligned" mean they have 4916 positive images of 4916 different faces?

Not necceseraly distinct - but yes, they gave 4916 different photos of faces. The faces were found manually by a "human expert".

Does "normalized" mean each of the 4916 images have the same features[file size, file type, picture color(gray scale/colored)]?

They only used a grey-scale pixels, normalized means they made sure there is no "black" and "white" pictures. If a picture was very dark - it was automatically brightened, and if it was not dark enough - it was darkened. This is done by an automatic component easily.

Does "scaled to a base resolution of 24x24" mean each of the 4916 images are re-sized to 24x24 pixels?

Yes, they made sure each "face" is exactly 24x24 pixels by applying some processing on the picture.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow