Question

I currently have the following content:

<section>
<hgroup>
<h1 style="text-align: center;">Koptitel 1</h1>
<h2 style="text-align: center;">Subtitel</h2>
</hgroup>
<ul class="sample1">
    <li class="sample2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat, urna eget ultricies egestas, lectus mi tincidunt nulla, ut molestie odio lectus ut arcu.</li>
    <li class="sample2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat, urna eget ultricies egestas, lectus mi tincidunt nulla, ut molestie odio lectus ut arcu."</li>
</ul>
</section>

Sandbox URL: http://regex101.com/r/zQ0lN5

I have the following code in PHP:

$new_content = preg_replace('/(?<=<ul class="sample1">|<\/li>)\s*?(?=<\/ul>|<li.*?>)/is', '', $content);

This works, the whitespaces between ul and li and between the li-items are removed so the expected output is.

<section>
<hgroup>
<h1 style="text-align: center;">Koptitel 1</h1>
<h2 style="text-align: center;">Subtitel</h2>
</hgroup>
<!-- SEE BELOW NO WHITE SPACES -->
<ul class="sample1"><li class="sample2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat, urna eget ultricies egestas, lectus mi tincidunt nulla, ut molestie odio lectus ut arcu.</li><li class="sample2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat, urna eget ultricies egestas, lectus mi tincidunt nulla, ut molestie odio lectus ut arcu."</li></ul>
</section>

I rather like to do the following:

//Ignore what's between < and > : <ul.*?>
$new_content = preg_replace('/(?<=<ul.*?>|<\/li>)\s*?(?=<\/ul>|<li.*?>)/is', '', $content);

So a coder can even add style or whatever in the ul tag and the code still won't break. However lookbehinds need to be zero-width, thus quantifiers are not allowed. So how do I fix this?

Was it helpful?

Solution

Maybe this can do the trick? You don't need lookbehinds.

echo preg_replace("/[\s\n]*?(\<(\/ul>|li[\s>]))/i", "$1", $your_document);

Where $your_document is HTML code you want to deal with.

So, if this is your HTML:

<section>
<hgroup>
<h1 style="text-align: center;">Koptitel 1</h1>
<h2 style="text-align: center;">Subtitel</h2>
</hgroup>
<ul class="sample1">
    <li class="sample2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat, urna eget ultricies egestas, lectus mi tincidunt nulla, ut molestie odio lectus ut arcu.</li>
    <li class="sample2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat, urna eget ultricies egestas, lectus mi tincidunt nulla, ut molestie odio lectus ut arcu.</li>
</ul>
</section>

Output for that looks like:

<section>
<hgroup>
<h1 style="text-align: center;">Koptitel 1</h1>
<h2 style="text-align: center;">Subtitel</h2>
</hgroup>
<ul class="sample1"><li class="sample2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat, urna eget ultricies egestas, lectus mi tincidunt nulla, ut molestie odio lectus ut arcu.</li><li class="sample2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat, urna eget ultricies egestas, lectus mi tincidunt nulla, ut molestie odio lectus ut arcu.</li></ul>
</section>

This removes all whitespaces and new-line (\n) characters between <ul> and <li>, between </li> and <li>, and between </li> and </ul> tags making entire <ul> element written in one line with no spaces between > and < inside. This regular expression is not case-sensitive so it also looks for <LI> as well as <li>.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top