Question

Hey so what I want to do is snag the content for the first paragraph. The string $blog_post contains a lot of paragraphs in the following format:

<p>Paragraph 1</p><p>Paragraph 2</p><p>Paragraph 3</p>

The problem I'm running into is that I am writing a regex to grab everything between the first <p> tag and the first closing </p> tag. However, it is grabbing the first <p> tag and the last closing </p> tag which results in me grabbing everything.

Here is my current code:

if (preg_match("/[\\s]*<p>[\\s]*(?<firstparagraph>[\\s\\S]+)[\\s]*<\\/p>[\\s\\S]*/",$blog_post,$blog_paragraph))
   echo "<p>" . $blog_paragraph["firstparagraph"] . "</p>";
else
  echo $blog_post;
Was it helpful?

Solution

Well, sysrqb will let you match anything in the first paragraph assuming there's no other html in the paragraph. You might want something more like this

<p>.*?</p>

Placing the ? after your * makes it non-greedy, meaning it will only match as little text as necessary before matching the </p>.

OTHER TIPS

If you use preg_match, use the "U" flag to make it un-greedy.

preg_match("/<p>(.*)<\/p>/U", $blog_post, &$matches);

$matches[1] will then contain the first paragraph.

It would probably be easier and faster to use strpos() to find the position of the first

 <p>

and first

</p>

then use substr() to extract the paragraph.

 $paragraph_start = strpos($blog_post, '<p>');
 $paragraph_end = strpos($blog_post, '</p>', $paragraph_start);
 $paragraph = substr($blog_post, $paragraph_start + strlen('<p>'), $paragraph_end - $paragraph_start - strlen('<p>'));

Edit: Actually the regex in others' answers will be easier and faster... your big complex regex in the question confused me...

Using Regular Expressions for html parsing is never the right solution. You should be using XPATH for this particular case:

$string = <<<XML
<a>
 <b>
  <c>texto</c>
  <c>cosas</c>
 </b>
 <d>
  <c>código</c>
 </d>
</a>
XML;

$xml = new SimpleXMLElement($string);

/* Busca <a><b><c> */
$resultado = $xml->xpath('//p[1]');
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top