How to grab the contents of HTML tags?

https://stackoverflow.com/questions/38691

09-06-2019
|

Question

Hey so what I want to do is snag the content for the first paragraph. The string $blog_post contains a lot of paragraphs in the following format:

<p>Paragraph 1</p><p>Paragraph 2</p><p>Paragraph 3</p>

The problem I'm running into is that I am writing a regex to grab everything between the first  tag and the first closing  tag. However, it is grabbing the first  tag and the last closing  tag which results in me grabbing everything.

Here is my current code:

if (preg_match("/[\\s]*<p>[\\s]*(?<firstparagraph>[\\s\\S]+)[\\s]*<\\/p>[\\s\\S]*/",$blog_post,$blog_paragraph))
   echo "<p>" . $blog_paragraph["firstparagraph"] . "</p>";
else
  echo $blog_post;

Solution

Well, sysrqb will let you match anything in the first paragraph assuming there's no other html in the paragraph. You might want something more like this

<p>.*?</p>

Placing the ? after your * makes it non-greedy, meaning it will only match as little text as necessary before matching the .

OTHER TIPS

If you use preg_match, use the "U" flag to make it un-greedy.

preg_match("/<p>(.*)<\/p>/U", $blog_post, &$matches);

$matches[1] will then contain the first paragraph.

It would probably be easier and faster to use strpos() to find the position of the first

<p>

and first

</p>

then use substr() to extract the paragraph.

 $paragraph_start = strpos($blog_post, '<p>');
 $paragraph_end = strpos($blog_post, '</p>', $paragraph_start);
 $paragraph = substr($blog_post, $paragraph_start + strlen('<p>'), $paragraph_end - $paragraph_start - strlen('<p>'));

Edit: Actually the regex in others' answers will be easier and faster... your big complex regex in the question confused me...

Using Regular Expressions for html parsing is never the right solution. You should be using XPATH for this particular case:

$string = <<<XML
<a>
 <b>
  <c>texto</c>
  <c>cosas</c>
 </b>
 <d>
  <c>código</c>
 </d>
</a>
XML;

$xml = new SimpleXMLElement($string);

/* Busca <a><b><c> */
$resultado = $xml->xpath('//p[1]');

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow