Question

1
00:00:00,074 --> 00:00:02,564
Previously on Breaking Bad...

2
00:00:02,663 --> 00:00:04,393
Words...

i need to parse srt files with php and print the all subs in the file with variables.

i couldn't find the right reg exps. when doing this i need to take the id, time and the subtitle variables. and when printing there musn't be no array() s or etc. must print just the same as in the orginal file.

i mean i must print like;

$number <br> (e.g. 1)
$time <br> (e.g. 00:00:00,074 --> 00:00:02,564)
$subtitle <br> (e.g. Previously on Breaking Bad...)

by the way i have this code. but it doesn't see the lines. it must be edited but how?

$srt_file = file('test.srt',FILE_IGNORE_NEW_LINES);
$regex = "/^(\d)+ ([\d]+:[\d]+:[\d]+,[\d]+) --> ([\d]+:[\d]+:[\d]+,[\d]+) (\w.+)/";

foreach($srt_file as $srt){

    preg_match($regex,$srt,$srt_lines);

    print_r($srt_lines);
    echo '<br />';

}
Was it helpful?

Solution

Here is a short and simple state machine for parsing the SRT file line by line:

define('SRT_STATE_SUBNUMBER', 0);
define('SRT_STATE_TIME',      1);
define('SRT_STATE_TEXT',      2);
define('SRT_STATE_BLANK',     3);

$lines   = file('test.srt');

$subs    = array();
$state   = SRT_STATE_SUBNUMBER;
$subNum  = 0;
$subText = '';
$subTime = '';

foreach($lines as $line) {
    switch($state) {
        case SRT_STATE_SUBNUMBER:
            $subNum = trim($line);
            $state  = SRT_STATE_TIME;
            break;

        case SRT_STATE_TIME:
            $subTime = trim($line);
            $state   = SRT_STATE_TEXT;
            break;

        case SRT_STATE_TEXT:
            if (trim($line) == '') {
                $sub = new stdClass;
                $sub->number = $subNum;
                list($sub->startTime, $sub->stopTime) = explode(' --> ', $subTime);
                $sub->text   = $subText;
                $subText     = '';
                $state       = SRT_STATE_SUBNUMBER;

                $subs[]      = $sub;
            } else {
                $subText .= $line;
            }
            break;
    }
}

if ($state == SRT_STATE_TEXT) {
    // if file was missing the trailing newlines, we'll be in this
    // state here.  Append the last read text and add the last sub.
    $sub->text = $subText;
    $subs[] = $sub;
}

print_r($subs);

Result:

Array
(
    [0] => stdClass Object
        (
            [number] => 1
            [stopTime] => 00:00:24,400
            [startTime] => 00:00:20,000
            [text] => Altocumulus clouds occur between six thousand
        )

    [1] => stdClass Object
        (
            [number] => 2
            [stopTime] => 00:00:27,800
            [startTime] => 00:00:24,600
            [text] => and twenty thousand feet above ground level.
        )

)

You can then loop over the array of subs or access them by array offset:

echo $subs[0]->number . ' says ' . $subs[0]->text . "\n";

To show all subs by looping over each one and displaying it:

foreach($subs as $sub) {
    echo $sub->number . ' begins at ' . $sub->startTime .
         ' and ends at ' . $sub->stopTime . '.  The text is: <br /><pre>' .
         $sub->text . "</pre><br />\n";
}

Further reading: SubRip Text File Format

OTHER TIPS

That is not going to match because your $srt_file array might look like this:

Array
([0] => '1',
[1] => '00:00:00,074 --> 00:00:02,564',
[2] => 'Previously on Breaking Bad...'.
[3] => '',
[4] => '2',
...
)

Your regex isn't going to match any of those elements.

If your intent is to read the entire file into one long memory-hog-of-a-string then use file_get_contents to get the entire file contents into one string. then use a preg_match_all to get all the regex matches.

Otherwise you might try to loop through the array and try to match various regex patterns to determine if the line is an id, a time range, or text and do thing appropriately. obviously you might also want some logic to make sure you are getting values in the right order (id, then time range, then text).

Group the file() array into chunks of 4 using array_chunk(), then omit the last entry, since it's a blank line like this:

foreach( array_chunk( file( 'test.srt'), 4) as $entry) {
    list( $number, $time, $subtitle) = $entry;
    echo $number . '<br />';
    echo $time . '<br />';
    echo $subtitle . '<br />';
}

I made a class to convert a .srt file to array. Each entry of the array has the following properties:

  • id: a number representing the id of the subtitle (2)
  • start: float, the start time in seconds (24.443)
  • end: float, the end time in seconds (27.647)
  • startString: the start time in human readable format (00:00:24.443)
  • endString: the end time in human readable format (00:00:24.647)
  • duration: the duration of the subtitle, in ms (3204)
  • text: the text of the subtitle (the Peacocks ruled over Gongmen City.)

The code is php7:

<?php

namespace VideoSubtitles\Srt;


class SrtToArrayTool
{


    public static function getArrayByFile(string $file): array
    {

        $ret = [];

        $gen = function ($filename) {
            $file = fopen($filename, 'r');
            while (($line = fgets($file)) !== false) {
                yield rtrim($line);
            }
            fclose($file);
        };

        $c = 0;
        $item = [];
        $text = '';
        $n = 0;
        foreach ($gen($file) as $line) {

            if ('' !== $line) {
                if (0 === $n) {
                    $item['id'] = $line;
                    $n++;
                }
                elseif (1 === $n) {
                    $p = explode('-->', $line);
                    $start = str_replace(',', '.', trim($p[0]));
                    $end = str_replace(',', '.', trim($p[1]));
                    $startTime = self::toMilliSeconds(str_replace('.', ':', $start));
                    $endTime = self::toMilliSeconds(str_replace('.', ':', $end));
                    $item['start'] = $startTime / 1000;
                    $item['end'] = $endTime / 1000;
                    $item['startString'] = $start;
                    $item['endString'] = $end;
                    $item['duration'] = $endTime - $startTime;
                    $n++;
                }
                else {
                    if ($n >= 2) {
                        if ('' !== $text) {
                            $text .= PHP_EOL;
                        }
                        $text .= $line;
                    }
                }
            }
            else {
                if (0 !== $n) {
                    $item['text'] = $text;
                    $ret[] = $item;
                    $text = '';
                    $n = 0;
                }
            }
            $c++;
        }
        return $ret;
    }


    private static function toMilliSeconds(string $duration): int
    {
        $p = explode(':', $duration);
        return (int)$p[0] * 3600000 + (int)$p[1] * 60000 + (int)$p[2] * 1000 + (int)$p[3];
    }


}

Or check it out here: https://github.com/lingtalfi/VideoSubtitles

You can use this project: https://github.com/captioning/captioning

Sample code:

<?php
require_once __DIR__.'/../vendor/autoload.php';

use Captioning\Format\SubripFile;

try {
    $file = new SubripFile('your_file.srt');

    foreach ($file->getCues() as $line) {
        echo 'start: ' . $line->getStart() . "<br />\n";
        echo 'stop: ' . $line->getStop() . "<br />\n";
        echo 'startMS: ' . $line->getStartMS() . "<br />\n";
        echo 'stopMS: ' . $line->getStopMS() . "<br />\n";
        echo 'text: ' . $line->getText() . "<br />\n";
        echo "=====================<br />\n";
    }

} catch(Exception $e) {
    echo "Error: ".$e->getMessage()."\n";
}

Sample output:

> php index.php
start: 00:01:48,387<br />
stop: 00:01:53,269<br />
startMS: 108387<br />
stopMS: 113269<br />
text: ┘ç┘à╪د┘ç┘┌»█î ╪▓█î╪▒┘┘ê█î╪│ ╪ذ╪د ┌ر█î┘█î╪ز ╪ذ┘┘ê╪▒█î ┘ê ┌ر╪»┌ر x265
=====================<br />
start: 00:02:09,360<br />
stop: 00:02:12,021<br />
startMS: 129360<br />
stopMS: 132021<br />
text: .┘à╪د ┘╪ذ╪د┘è╪» ╪ز┘┘ç╪د┘è┘è ╪د┘è┘╪ش╪د ╪ذ╪د╪┤┘è┘à -
┌╪▒╪د ╪ا<br />
=====================<br />
start: 00:02:12,022<br />
stop: 00:02:14,725<br />
startMS: 132022<br />
stopMS: 134725<br />
text: ..╪د┌»┘ç ┘╛╪»╪▒╪ز -
.╪د┘ê┘ ┘ç┘è┌┘ê┘é╪ز ┘à╪ز┘ê╪ش┘ç ╪▒┘╪ز┘┘à┘ê┘ ┘┘à┘è╪┤┘ç -<br />
=====================<br />

it can be done by using php line-break. I could do it successfully let me show my code

$srt=preg_split("/\\r\\n\\r\\n/",trim($movie->SRT));
            $result[$i]['IMDBID']=$movie->IMDBID;
            $result[$i]['TMDBID']=$movie->TMDBID;

here $movie->SRT is the subtitle of having format u posted in this question. as we see, each time space is two new line, hope u getting answer.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top