Question

I grab subtitle from movie screenshot. An example enter image description here

It will grab

Hey, why don't we all just relax, huh?

It has no relation with subtitle. It is screenshot. Since it is a subtitle we know the font type size etc if this will make it easier to grab.

I know most of you will say PHP OCR library but since the background is always different, it looks like it won't work.

Was it helpful?

Solution

The background being different shouldn't be a problem, you can just use an image library to remove anything that isn't the text colour.

Here's a quick example that gives a decent idea of what I mean, it replaces any colour lower than #f5f5f5 with #000000,

<?php
$im = imagecreatefromjpeg("img.jpg");

for ($x = imagesx($im); $x--;) 
{
    for ($y = imagesy($im); $y--;) 
    {
        $rgb = imagecolorat($im, $x, $y);

        if ((($rgb >> 16) & 0xFF) <= 245 
            && (($rgb >> 8) & 0xFF) <= 245 
            && ($rgb & 0xFF) <= 245) 
        {
            $black = imagecolorallocate($im, 0, 0, 0);
            imagesetpixel($im, $x, $y, $black);
        }
    }
}

header("Content-Type: image/jpeg");
imagejpeg($im);

Here's how the result looks:

You can probably chop most of the top part off since you know the subtitles will be at the bottom. Then just run it through an OCR library.

For PHP there's PhpOCR, although this has to be taught first with example letters.

It's probably better to use an external OCR library or command line tool and call it from PHP. For external tools, there's tesseract and ocropus (I believe ocropus is sponsored by Google too).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top