Question

I have an array full of patterns that I need matched. Any way to do that, other than a for() loop? Im trying to do it in the least CPU intensive way, since I will be doing dozens of these every minute.

Real world example is, Im building a link status checker, which will check links to various online video sites, to ensure that the videos are still live. Each domain has several "dead keywords", if these are found in the html of a page, that means the file was deleted. These are stored in the array. I need to match the contents pf the array, against the html output of the page.

Was it helpful?

Solution

First of all, if you literally are only doing dozens every minute, then I wouldn't worry terribly about the performance in this case. These matches are pretty quick, and I don't think you're going to have a performance problem by iterating through your patterns array and calling preg_match separately like this:

$matches = false;
foreach ($pattern_array as $pattern)
{
  if (preg_match($pattern, $page))
  {
    $matches = true;
  } 
}

You can indeed combine all the patterns into one using the or operator like some people are suggesting, but don't just slap them together with a |. This will break badly if any of your patterns contain the or operator.

I would recommend at least grouping your patterns using parenthesis like:

foreach ($patterns as $pattern)
{
  $grouped_patterns[] = "(" . $pattern . ")";
}
$master_pattern = implode($grouped_patterns, "|");

But... I'm not really sure if this ends up being faster. Something has to loop through them, whether it's the preg_match or PHP. If I had to guess I'd guess that individual matches would be close to as fast and easier to read and maintain.

Lastly, if performance is what you're looking for here, I think the most important thing to do is pull out the non regex matches into a simple "string contains" check. I would imagine that some of your checks must be simple string checks like looking to see if "This Site is Closed" is on the page.

So doing this:

foreach ($strings_to_match as $string_to_match)
{
  if (strpos($page, $string_to_match) !== false))
  {
    // etc.
    break;
  }
}
foreach ($pattern_array as $pattern)
{
  if (preg_match($pattern, $page))
  {
    // etc.
    break;
  } 
}

and avoiding as many preg_match() as possible is probably going to be your best gain. strpos() is a lot faster than preg_match().

OTHER TIPS

// assuming you have something like this
$patterns = array('a','b','\w');

// converts the array into a regex friendly or list
$patterns_flattened = implode('|', $patterns);

if ( preg_match('/'. $patterns_flattened .'/', $string, $matches) )
{
}

// PS: that's off the top of my head, I didn't check it in a code editor

If your patterns don't contain many whitespaces, another option would be to eschew the arrays and use the /x modifier. Now your list of regular expressions would look like this:

$regex = "/
pattern1|   # search for occurences of 'pattern1'
pa..ern2|   # wildcard search for occurences of 'pa..ern2'
pat[ ]tern| # search for 'pat tern', whitespace is escaped
mypat       # Note that the last pattern does NOT have a pipe char
/x";

With the /x modifier, whitespace is completely ignored, except when in a character class or preceded by a backslash. Comments like above are also allowed.

This would avoid the looping through the array.

If you're merely searching for the presence of a string in another string, use strpos as it is faster.

Otherwise, you could just iterate over the array of patterns, calling preg_match each time.

If you have a bunch of patterns, what you can do is concatenate them in a single regular expression and match that. No need for a loop.

What about doing a str_replace() on the HTML you get using your array and then checking if the original HTML is equal to the original? This would be very fast:

 $sites = array(
      'you_tube' => array('dead', 'moved'),
      ...
 );
 foreach ($sites as $site => $deadArray) {
     // get $html
     if ($html == str_replace($deadArray, '', $html)) { 
         // video is live
     }
 }

You can combine all the patterns from the list to single regular expression using implode() php function. Then test your string at once using preg_match() php function.

$patterns = array(
  'abc',
  '\d+h',
  '[abc]{6,8}\-\s*[xyz]{6,8}',
);

$master_pattern = '/(' . implode($patterns, ')|(') . ')/'

if(preg_match($master_pattern, $string_to_check))
{
  //do something
}

Of course there could be even less code using implode() inline in "if()" condition instead of $master_pattern variable.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top