Question

I'm using file_get_contents() on PHP scripts to auto-list variable names in order of appearance using preg_match_all() then filtering them by first appearance using array_filter(). I need to be able to catch variables in the following formats (where any variable name contains the characters [a-zA-Z_-\d]):

//standard variables
$variable

So far I've written a simple regex expression that fits all of my requirements for this single stand-alone variable: [a-zA-Z_-\d]

//arrays
$array[0]
$array2[$key]

//form and session variables
$_GET['frm']
$_SESSION['user']
$_POST[$var]
etc...

I've attempted to do the arrays and (form/session) variables myself and have got to an expression of ([$][a-zA-Z_-\d]+\[?\'?[$]?[a-zA-Z_-\d]+\'?\]?) see http://regex101.com/r/dO8tI2 for a demonstration of how I've got this far. However I am now catching things I don't want (and I'm not sure how to adjust for this...) - i.e. the bracket on match 7, How do I only select the text $test and not $test] if there is no starting [. - so on and so forth for variable variables also...

//variable variables
${$var."a"}
${$var.$var2}
${"a"."b"}
${$var."a".$var3}
etc...

Where the character " is used in the examples above, also the character ' would also need to be accepted instead.

As I am fairly new to regex and am stuck at the point of the hyper-link above in my steps to my expression, could someone write a single regex expression that fits the above criteria and explain briefly how it works?

I've tried playing around with the OR | operator and non-caught groups (?; conditions...) however I'm going around in circles.

Was it helpful?

Solution

I've come up with following regex that match your requirements:

(\$[a-zA-Z_-\d]+(\[('[^']+'|\$?[a-zA-Z_-\d]+)\])?)

Explanation:

  1. \$ : catch $ literally
  2. [a-zA-Z_-\d]+ : followed by one ore more alphanumeric characters or dash or underscore.
  3. (\[('[^']+'|\$?[a-zA-Z_-\d]+)\])? : Followed by zero or one string which starts with [ and ends with ] and between brackets has string matching '[^']+' OR \$?[a-zA-Z_-\d]+. Let's break apart those 2 substrings:
    • '[^']+' : matches any string between single quotes.
    • \$?[a-zA-Z_-\d]+ : matches any string that optionally starts with $ and contains one or more alphanumeric characters, dashes and underscores.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top