Question

I have a MySQL database with book titles in both English and Arabic and I'm using a PHP class that can automatically transliterate Arabic text into Latin script.

I'd like my output HTML to look something like this:

<h3>A book</h3>
<h3>كتاب <em>(kitaab)</em></h3>
<h3>Another book</h3>

Is there a way for PHP to determine the language of a string based on the Unicode characters and glyphs used in it? I'm trying to get something like this:

$Ar = new Arabic('EnTransliteration');
while ($item = mysql_fetch_array($results)) {
    ...
    if (some test to see if $item['item_title'] has Arabic glyphs in it) {
      echo "<h3>$item[item_title] <em>(" . $Ar->ar2en($item['item_title']) . ")</em></h3>";
    } else {
      echo "<h3>$item[item_title]</h3>";
    }
    ...
}

Fortunately the class doesn't choke when fed Latin characters, so in theory I could send every result through the transformation, but that seems like a waste of processing.

Thanks!

Edit: I still haven't found a way to check for glyphs or characters. I suppose I could put all the Arabic characters in an array and check if anything in the array matches a part of the string...

I did, however, figure out an interim solution that might work fine in the end. It puts every title through the transformation regardless of language, but only outputs the parenthetical transliteration if the string was changed:

while ($item = mysql_fetch_array($mysql_results)) {
    $transliterate = trim(strtolower($Ar->ar2en($item['item_title'])));
    $item_title = (strtolower($item['item_title']) == $transliterate) ? $item['item_title'] : $item['item_title'] . " <em>($transliterate)</em>";

    echo "<h3>$item_title</h3>";
}
Was it helpful?

Solution

This should do it:

preg_match("/\p{Arabic}/u", $item['item_title'])

You could make that regular expression a bit more sophisticated if you want to, but I don't think you really need to.

The \p escape sequence lets you select characters based on their Unicode properties (when the u pattern modifier is used).

The PHP manual mentions: "Extended properties such as "Greek" or "InMusicalSymbols" are not supported by PCRE." But that's not entirely true anymore. PCRE release 6.5 added support for script names.

OTHER TIPS

Here's an PHP open source class for Arabic character set auto detection:

http://www.ar-php.com/php/arabic/index.html#ArCharsetD

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top