Domanda

Transliterator::listIDs() will list IDs, but apparently it's not a complete list.

In the example from this page, the ID looks like:

Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove; Lower();

which is kind of weird, because IDs are supposed to be unique. This looks more like a rule, but it doesn't work if I pass it to the createFromRules method :)

Anyway, I'm trying to remove any punctuation from the string, except dash (-), or characters from a specific list.

Do you know if that's possible? Or is there some documentation that better explains the syntax for the transliterator ?

È stato utile?

Soluzione

The ids that Transliterator::listIDs() are the "basic ids". The example you gave is a "compound id". You can see the ICU docs on this.

You can also create your own rules with Transliterator::createFromRules().

You can take a look at the prefefined rules:

<?php
$a = new ResourceBundle(NULL, sprintf('icudt%dl-translit', INTL_ICU_VERSION), true);

foreach ($a['RuleBasedTransliteratorIDs'] as $name => $v) {
    $file = @$v['file'];
    if (!$file) {
        $file = $v['internal'];
        echo $name, " (direction $file[direction]; internal)\n";
    } else { 
        echo $name, " (direction: $file[direction])\n";
        echo $file['resource'];
    }
    echo "\n--------------\n";
}

After formatting, the result looks like this.

Altri suggerimenti

Just in case someone wants a working example. The example mentioned (from the php manual) uses procedural style. To make it work with an object oriented style, use create() instead of createFromRules()

removePunctuation($string) {
    $transliterator = Transliterator::create("Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove;", \Transliterator::FORWARD);

    return $transliterator->transliterate($string);
}
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top