Question

I need some regular expression to split PO (language translation file) file's translated count , fuzzy count and total string count.

I used PHP for the program, I search every where but couldn't found.

please help me.

Was it helpful?

Solution 2

Try this regex,

$total = array();
$translated = array();
$extra ='';


// If fuzzy true then translated count = fuzzy count
if($fuzzy) {
   $extra = '#, fuzzy\n';
} 

$matched = preg_match_all('/'.$extra.'msgid\s+((?:".*(?<!\\\\)"\s*)+)\s+'.'msgstr\s+((?:".*(?<!\\\\)"\s*)+)/', $po_content, $matches);

    for ($i = 0; $i < $matched; $i++) {
        if(trim(substr(rtrim($matches[1][$i]), 1, -1))!="") {
            $total[] = substr(rtrim($matches[1][$i]), 1, -1);
        }
        if(trim(substr(rtrim($matches[2][$i]), 1, -1))!="") {
            if (strpos(substr(rtrim($matches[2][$i]), 1, -1), 'Language-Team')===false && strpos(substr(rtrim($matches[2][$i]), 1, -1), 'MIME-Version')===false ) {
                $translated[] = substr(rtrim($matches[2][$i]), 1, -1); 
            }
        }
    }

Total count = count($total); Translated count = count($translated);

OTHER TIPS

gettext PO files are so old and ubiquitous, they're a de facto industry standard with great support by a wide variety of tools. Trying to reinvent a solution using regexen here seems very inappropriate when you could be using one of the many PO file parsers instead. For example oscarotero/Gettext:

$translations = Gettext\Extractors\Po::extract('messages.po');

$total = $translated = $fuzzy = 0;

foreach ($translations as $translation) {
    $total++;

    if (!$translation->hasTranslation()) {
        $untranslated++;
    }

    if (in_array('fuzzy', $translation->getComments())) {
        $fuzzy++;
    }
}

(Untested, but should work immediately or with slight changes.)

In fact though, there are tools to do this already: Translate Toolkit or Pology, for those I know of:

$ pocount locale/ko/LC_MESSAGES/

data/locale/ko/LC_MESSAGES/messages.po
type              strings      words (source)    words (translation)
translated:       3 (  0%)          7 (  0%)              28
fuzzy:            0 (  0%)          0 (  0%)             n/a
untranslated:   729 ( 99%)       1065 ( 99%)             n/a
Total:          732              1072                     28

unreviewed:       3 (  0%)          7 (  0%)              28
empty:          729 ( 99%)       1065 ( 99%)               0

$ posieve stats locale/ko/
-              msg   msg/tot   w-or   w/tot-or   w-tr   ch-or   ch-tr
translated       3      0.4%     15       0.9%     26      93     114
fuzzy            0      0.0%      0       0.0%      0       0       0
untranslated   729     99.6%   1708      99.1%      0   17323       0
total          732         -   1723          -     26   17416     114
obsolete         0         -      0          -      0       0       0
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top