Question

I have a typical yearbook with photos and a name beneath each photo. Is there a programmatic way to scan all of the photos and save them with the name beneath the photo?

Was it helpful?

Solution

Yes - but unless your 'typical' school has > 1000 students in the year it's going to be easier to type the names in manually.

Finding the name box in the scan, isolating the text, ocr'ign it and then hooking up all the software to crop and save the photos manually is going to take you a lot longer than the 2-3 seconds it takes to type a name.

edit - I don't know of any scanning software that does this - there might be something for newspapers.
If the layout of the year books is consistent (at least across the same book) you could scan a page and have either batch mode in your favourite image app, or some command line tool split it out into separate images based on the pixel coordinates. You could then extract just the name box into a separate image and do ocr on that. If they are relatively modern and were layed out in a DTP package with clean fonts this shoudl work well - older books with typewriter captions and paste markup might be harder

Another alternative - depending on privacy issues - would be to crowd surf the problem.
Since presumably you aren't just doing this for your own amusement and want people from the school to be interested.
- Create a facebook/myspace/flickr (or whatever the cool kids are using this hour) for your school.
- Post each picture (or class shot) and ask people to enter the name - either from recognising the person or by reading the caption.
- Another approach is to post the pictures on your site as PDFs and have google index it and do the OCR for you.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top