Regex matching between curly brackets yields too many results

https://stackoverflow.com/questions/13415832

29-11-2021
|

Question

I have a bunch of text, for example:

foofoofooabcdefhjkldh389dn{pdf}images/1.pdf,100%,500{/pdf}hfnkjt8499duidjglkj

I'd like to extract the following:

{pdf}images/1.pdf,100%,500{/pdf}

So here's a regex I made:

#{pdf}(.*?){/pdf}#

When checking the results I get back:

Array
(
[0] => {pdf}images/1.pdf,100%,500{/pdf}
[1] => images/1.pdf,100%,500
)

I expected to only get the first item in the array, but instead there's two items. I'm using PHP and for testing I use the following website: PHP Regex Tester

How can I only obtain the {pdf}...{/pdf} text?

Solution

your using a group in your regex. in your case the group is

(.*?)

This causes PHP to give you the full result {PDF}sometext{/PDF} and the sometext as found in the first group.

just try the following to get rid of the group:

#{pdf}.*?{/pdf}#

OTHER TIPS

You do not have twor results.

The problem (it is not a problem though) here is that probably a function preg_match is used. This function returns both the whole matching query, that is {pdf}images/1.pdf,100%,500{/pdf}, as well as the final result, that is images/1.pdf,100%,500.

So You only need to use the $result[1] for further parsing.

Use a non capturing group, to ensure the central text doesn't show up as a backreference in the array, and use zero width assertions to ensure the {pdf} part isn't part of the match:

#(?<={pdf})(?:.*?)(?={/pdf})#

If you want to keep the {pdf} delimiters:

#{pdf}(?:.*?){/pdf}#

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow