Domanda

I have this very long transliteration:

$text =~ tr/áàăâǎåǻäǟãȧǡąāȁȃɑʙƀɓƃćĉčċçȼƈɕʗďđðɖɗƌȡéèĕêěëėȩęēȅȇɇɛ/aaaaaaaaaaaaaaaaabbbbcccccccccdddddddeeeee/;
# Etc. (About 400 chars)

I want to split it into several transliterations since the resulting code would be easier to maintain:

$text =~ tr/áàăâǎåǻäǟãȧǡąāȁȃɑ/aaaaaaaaaaaaaaaaa/;
$text =~ tr/ʙƀɓƃ/bbbb/;
$text =~ tr/ćĉčċçȼƈɕʗ/ccccccccc/;
# Etc.

I believe that is going to slow things down, but I'd like to know for sure. This process runs about 1000 times per second on a pretty busy server.

Thanks.

È stato utile?

Soluzione 3

Here is a benchmark:

use Benchmark qw(:all);

my $str = 'áàăâǎåǻäǟãȧǡąāȁȃɑʙƀɓƃćĉčċçȼƈɕʗďđðɖɗƌȡéèĕêěëėȩęēȅȇɇɛ/aaaaaaaaaaaaaaaaabbbbcccccccccdddddddeeeee';
my $count = -2;
cmpthese($count, {
    'one tr' => sub {
        $str =~ tr/áàăâǎåǻäǟãȧǡąāȁȃɑʙƀɓƃćĉčċçȼƈɕʗďđðɖɗƌȡéèĕêěëėȩęēȅȇɇɛ/aaaaaaaaaaaaaaaaabbbbcccccccccdddddddeeeee/;
    },
    'multi tr' => sub {
        $str =~ tr/áàăâǎåǻäǟãȧǡąāȁȃɑ/aaaaaaaaaaaaaaaaa/;
        $str =~ tr/ʙƀɓƃ/bbbb/;
        $str =~ tr/ćĉčċçȼƈɕʗ/ccccccccc/;
        $str =~ tr/ďđðɖɗƌȡ/ddddddd/;
        $str =~ tr/éèĕêěëėȩęēȅȇɇɛ/eeeee/;
    },
});

result:

              Rate multi tr   one tr
multi tr 1215538/s       --     -81%
one tr   6271883/s     416%       --

As we see, one tr is 5 times faster than multi-tr.

Altri suggerimenti

You could build a transliterator:

my %translits = (
   'áàăâǎåǻäǟãȧǡąāȁȃɑ' => 'a',
   'ʙƀɓƃ'              => 'b',
   'ćĉčċçȼƈɕʗ'         => 'c',
);

my $pat  = '';
my $repl = '';
for (keys(%translit)) {
   $pat  .= $_;
   $repl .= $translit{$_} x length($_);
}

my $tr1 = eval "sub { tr/\Q$pat\E/\Q$repl\E/ }" or die $@;
   -or-
my $tr2 = eval "sub { \$_[0] =~ tr/\Q$pat\E/\Q$repl\E/ }" or die $@;

Then use it like this:

$tr1->() for $str;
   -or-
$tr2->($str);

Of course, you could always use Text::Unidecode.

I would expect the second solution with three operations to be slower, because it re-scans characters in $text that have already been substituted.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top