Question

In stock PHP5, what is a good preg_replace expression for making this transformation:

replace newlines with <br />, but only within <pre> blocks

(Feel free to make simplifying assumptions, and ignore corner cases. For example, we can assume that tags will be one line, and not pathological things like )

Input text:

<div><pre class='some class'>1
2
3
</pre>
<pre>line 1
line 2
line 3
</pre>
</div>

Output:

<div><pre>1<br />2<br />3<br /></pre>
<pre>line 1<br />line 2<br />line 3<br /></pre>
</div>

(Motivating context: trying to close out bug 20760 in a wikimedia SyntaxHighlight_GeSHI extension, and finding the my PHP skills (I mostly do python) aren't up to snuff).

I'm open to other solutions, besides regexen, but small is preferred (as an example, building html parse machinery is overkill).

Was it helpful?

Solution 2

Based on something SilentGhost said (which isn't showing up here for some reason):

<?php
$str = "<div><pre class='some class' >1
2
3
< / pre>
<pre>line 1
line 2
line 3
</pre>
</div>";

$out = "<div><pre class='some class' >1<br />2<br />3<br />< / pre>
<pre>line 1<br />line 2<br />line 3<br /></pre>
</div>";

function protect_newlines($str) {
    // \n -> <br />, but only if it's in a pre block
    // protects newlines from Parser::doBlockLevels()
    /* split on <pre ... /pre>, basically.  probably good enough */
    $str = " ".$str;  // guarantee split will be in even positions
    //$parts = preg_split('/(<pre .*  pre>)/Umsxu',$str,-1,PREG_SPLIT_DELIM_CAPTURE);
    $parts = preg_split("/(< \s* pre .* \/ \s* pre \s* >)/Umsxu",$str,-1,PREG_SPLIT_DELIM_CAPTURE);
    foreach ($parts as $idx=>$part) {
        if ($idx % 2) {
            $parts[$idx] = preg_replace("/\n/", "<br />", $part);
        }
    }
    $str = implode('',$parts);
    /* chop off the first space, that we had added */
    return substr($str,1);
}

assert(protect_newlines($str) === $out);
?>

OTHER TIPS

Something like this?

<?php

$content = "<div><pre class='some class'>1
2
3
</pre>
<pre>line 1
line 2
line 3
</pre>
</div>
";

function getInnerHTML($Node)
{
     $Body = $Node->ownerDocument->documentElement->firstChild->firstChild;
     $Document = new DOMDocument();    
     $Document->appendChild($Document->importNode($Body,true));
     return $Document->saveHTML();
}

$dom = new DOMDocument();
$dom->loadHTML( $content );
$preElements = $dom->getElementsByTagName('pre');

if ( count( $preElements ) ) {
    foreach ( $preElements as $pre ) {
    $value = preg_replace( '/\n|\r\n/', '<br/>', $pre->nodeValue  );
    $pre->nodeValue = $value;
    }

    echo html_entity_decode( getInnerHTML( $dom->documentElement ) );
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top