Regex to match string between %
-
13-09-2019 - |
Question
I'm trying to match substrings that are enclosed in %'s but preg_match_all
seems to include several at the same time in the same line.
Code looks like this:
preg_match_all("/%.*%/", "%hey%_thereyou're_a%rockstar%\nyo%there%", $matches);
print_r($matches);
Which produces the following output.
Array
(
[0] => Array
(
[0] => %hey%_thereyou're_a%rockstar%
[1] => %there%
)
)
However I'd like it to produce the following array instead:
[0] => %hey%
[1] => %rockstar%
[2] => %there%
What am I missing?
Solution
Replace the ".
" in your regular expression with "[^%]
":
preg_match_all("/%[^%]*%/", "%hey%_thereyou're_a%rockstar%\nyo%there%", $matches);
What is happening is that the ".
" is "greedily" matching as much as it possibly can, including everything up-to the final % on the line. Replacing it with the negated character class "[^%]
" means that it will instead match anything except a percent, which will make it match just the bits that you want.
Another option would be to place a "?
" after the dot, which tells it "don't be greedy":
preg_match_all("/%.*?%/", "%hey%_thereyou're_a%rockstar%\nyo%there%", $matches);
In the above example, either option will work, however there are times when you may be searching for something larger than a single character, so a negated character class will not help, so the solution is to un-greedify the match.
OTHER TIPS
You're doing a greedy match - use ?
to make it ungreedy:
/%.*?%/
If a newline can occur inside the match, add the s (DOTALL) modifier:
/%.*?%/s
Add a ? after the *:
preg_match_all("/%.*?%/", "%hey%_thereyou're_a%rockstar%\nyo%there%", $matches);
The reason is that the star is greedy. That is, the star causes the regex engine to repeat the preceding token as often as possible. You should try .*? instead.
You could try /%[^%]+%/
- this means in between the percent signs you only want to match characters which are not percent signs.
You could also maybe make the pattern ungreedy, e.g. /%.+%/U
, so it will capture as little as possible (I think).
|%(\w+)%| This will work exactly what do you want.