your using a group in your regex. in your case the group is
(.*?)
This causes PHP to give you the full result {PDF}sometext{/PDF} and the sometext as found in the first group.
just try the following to get rid of the group:
#{pdf}.*?{/pdf}#
Question
I have a bunch of text, for example:
foofoofooabcdefhjkldh389dn{pdf}images/1.pdf,100%,500{/pdf}hfnkjt8499duidjglkj
I'd like to extract the following:
{pdf}images/1.pdf,100%,500{/pdf}
So here's a regex I made:
#{pdf}(.*?){/pdf}#
When checking the results I get back:
Array
(
[0] => {pdf}images/1.pdf,100%,500{/pdf}
[1] => images/1.pdf,100%,500
)
I expected to only get the first item in the array, but instead there's two items. I'm using PHP and for testing I use the following website: PHP Regex Tester
How can I only obtain the {pdf}...{/pdf}
text?
Solution
your using a group in your regex. in your case the group is
(.*?)
This causes PHP to give you the full result {PDF}sometext{/PDF} and the sometext as found in the first group.
just try the following to get rid of the group:
#{pdf}.*?{/pdf}#
OTHER TIPS
You do not have twor results.
The problem (it is not a problem though) here is that probably a function preg_match
is used. This function returns both the whole matching query, that is {pdf}images/1.pdf,100%,500{/pdf}
, as well as the final result, that is images/1.pdf,100%,500
.
So You only need to use the $result[1]
for further parsing.
Use a non capturing group, to ensure the central text doesn't show up as a backreference in the array, and use zero width assertions to ensure the {pdf}
part isn't part of the match:
#(?<={pdf})(?:.*?)(?={/pdf})#
If you want to keep the {pdf}
delimiters:
#{pdf}(?:.*?){/pdf}#