I'm running a series of regexes against blocks of data. We recently upgraded from Activestate perl 5.8 32bit (I know... extremely old!) to perl 5.16 64bit. All the hardware stayed the same (windows).

We are noticing a performance hit where as before our parse loop would take about 2.5 seconds, now it takes about 5 seconds. Can anybody give me a hint as to what would cause the change? I was expecting an increase in performance as my understanding was that the engine had improved greatly, any docs on what I should be doing different would be greatly appreciated.

有帮助吗?

解决方案

Yes, the regex engine improved greatly after v8. Alone in v10, we saw:

  • pattern recursion
  • named captures
  • possessive quantifiers
  • backtrack control verbs like (*FAIL) or (*SKIP).
  • The \K operator
  • … and some more

Also, more internals were made Unicode-aware.

In v12, the Unicode support was cleaned up. The \p and \X operators in regexes are now greatly enhanced.

In v14, the Unicode support was bumped to 6.0. Charnames for the \N operator were improved (see also charnames pragma). The new character model can treat any unsigned integer as a codepoint. In the regex engine,

  • regexes can now carry charclass modifiers like /u, /d, /l, /a, /aa.
  • Non-destructive susbtitution with /r was implemented.
  • The RE engine is now reentrant, so embedded code can use regexes.
  • \p was cleaned up
  • regex compilation is faster when a switch to unicode semantics is neccessary.

In v16, perl almost supports Unicode 6.1. In the regex engine,

  • efficiency of \p charclasses was increased.
  • Various regex bugs (often involving case-insensitive matching) were fixed.

Obviously, not all of these features come at a price, but especially Unicode-awareness makes internals more complicated, and slower.

You also cannot waive a hand and state that the execution time of a script doubled from perl5 v8 x86 to perl5 v16 x64; there are too many variables:

  • were both Perls compiled with the same flags?
    • are both perls threaded perls (disabling threading support makes it faster)
    • how big are your integers? 64 bit or 32 bit?
    • what compiler optimizations were chosen?
  • did your previous Perl have some distribution-specific patches applied?

Basically, you have to compare the whole perl -V output.


If you are hitting a performance ceiling with regexes, they may be the wrong tool for extensive parsing. At the very least, you may use the newer features to optimize the regexes to eliminate some backtracking.

If your parsing code describes a (roughly) context-free language (i.e. you don't use (?{...}), (?=...) or related regex features), and parsing means doing something like generating a tree, then Marpa::R2 might speed things up considerably.

其他提示

If you are looking for better performance you may also want to make sure that a regex is what you want. You didn't specify what kind of regexes your system was using but often you can replace a regex with a built-in function.

Examples:

if (lc($name) eq 'bob') { $bob_count++ }  #Faster
if ($name =~ /^bob$/i)  { $bob_count++ }  #Slower 

my $sentiment = "I don't like beans.";
substr($sentiment, 13, 5) = 'broccoli';   #Faster
$sentiment = "I don't like beans.";
$sentiment =~ s/beans/broccoli/;          #Slower

These examples, as well as unpack, and index, might not apply to your code, but if they do you should benchmark them and see if it helps with performance.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top