Question

I am trying to build a regular expression which matches different types of echo statements.... the word echo has already been match..

Example patterns to be matched

"hiii";
"how"."are"."you";
$var."abc";
"abc".$var;
'how'."how".$var;

pattern for var

/^[a-zA-Z_][a-zA-Z0-9_]*/

I already have a pattern to match first 2 patterns...

/((^"[^"]*"\.{0,1})*;)/
Was it helpful?

Solution

Next to the two given suggestions, if you're looking for PHP PCRE based regexes to validate a subset of PHP, this can be done more structured by specifying named subpatterns for the tokens you're looking for. Here is an exemplary regular expression pattern that's looking for these patterns even allowing whitespace around (as PHP would do) for any us-ascii based extended single-byte charsets (I think this is how PHP actually treats it even if it's UTF-8 in your files):

~
(?(DEFINE)
    (?<stringDoubleQuote> "(?:\\"|[^"])+")
    (?<stringSingleQuote> '(?:\\'|[^'])+')
    (?<string> (?:(?&stringDoubleQuote)|(?&stringSingleQuote)))
    (?<variable> \\\$([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*))
    (?<varorstring> (?:(?&variable)|(?&string)))
)
^ \s* (?&varorstring) (?: \s* \. \s* (?&varorstring) )* \s* ; $
~x

Thanks to the named subpatterns it's easy to use a token for any string or variable and add the whitespace handling and string concatenating operator. Such assigned to $pattern, an example of use is:

$lines = <<<'LINES'
"hiii";
"how"."are"."you";
$var."abc";
"abc".$var;
'how'."how".$var;
LINES;    

foreach (explode("\n", $lines) as $subject) {
    $result = preg_match($pattern, $subject);
    if (FALSE === $result) {
        throw new LogicException('PCRE pattern did not compile.');
    }
    printf("%s %s match.\n", var_export($subject, true), $result ? 'did' : 'did not');
}

Output:

'"hiii";' did match.
'"how"."are"."you";' did match.
'$var."abc";' did match.
'"abc".$var;' did match.
'\'how\'."how".$var;' did match.

Demo: https://eval.in/142721

Related

OTHER TIPS

Regular expressions aren't a solution for everything. For example, in this case it's easily noticeable you want to parse PHP code. Just like you shouldn't parse HTML with regex, you shouldn't parse PHP with regex.

Instead, use PHP's tokenizer, which can be used to parse PHP expressions.

You can do that with the following regex without needing to use recursion:

^"[^"]+"(\."[^"]+")*;$

Demo: http://regex101.com/r/oW5zH4

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top