Question

I'm trying to remove the white space/ blank characters inside HTML attributes:

Initial

<p style="font-size: 1.5em; color: navy;">
 Text With Spaces
<p style="font-size: 1.2em; color: navy;"> 

Desired

<p style="font-size:1.5em;color:navy;">
 Text with spaces
<p style="font-size:1.2em;color:navy;">

First I tried to removed all the whitespaces by using preg_replace then doing a find and replace to insert back the however that removed all spaces from the document

$cont = preg_replace( '/\s+/', '', $original);
$cont2 = str_replace('<p', "<p ", $cont2);

Incorrect

<p style="font-size:1.5em;color:navy;">
 Textwithspaces
<p style="font-size:1.5em;color:navy;">
Was it helpful?

Solution

It makes it a little tricky in that you want to remove any spaces, except the one immediately after the tag name, but it certainly can be done.

$string = '<p style="font-size: 1.5em; color: navy;">
 Text With Spaces
<p style="font-size: 1.2em; color: navy;"> ';

$string = preg_replace_callback('~<([A-Z0-9]+) \K(.*?)>~i', function($m) {$replacement = preg_replace('~\s*~', '', $m[0]); return $replacement;}, $string);

print $string;

What I'm doing here is looking for an opening HTML bracket <, followed by a tag name, followed by a space. For the tag name, I'm using a character class [] that allows letters A-Z and numbers 0-9. It must contain at least one of those characters +, but may contain more than that ... all the way up until it hits a space. Then I'm telling the REGEX to start remembering our matches from this point forward \K and look for any character . as many times as it can find it *, all the way up until it hits the next thing in our REGEX ?. In this case, the next thing in our REGEX is a closing HTML bracket >.

Then I'm using preg_replace_callback to evaluate each match that it finds. For each match, it looks for any number of spaces \s* and replaces them with nothing.

This will give you what you are asking for:

<p style="font-size:1.5em;color:navy;">
  Text With Spaces
<p style="font-size:1.2em;color:navy;"> 

However, it will also have some other side effects. Take, for example, a string like this:

$string = '<p style="font-size: 1.5em; color: navy;"><a href="http://www.google.com" style="color: orange;">Google Website</a>
 Text With Spaces
<p style="font-size: 1.2em; color: navy;"> <img src="http://www.google.com/images/logo.gif" width=100 height=100>';

Using the above regular expression will give you results like this:

<p style="font-size:1.5em;color:navy;"><a href="http://www.google.com"style="color:orange;">Google Website</a>
 Text With Spaces
<p style="font-size:1.2em;color:navy;"> <img src="http://www.google.com/images/logo.gif"width=100height=100>

To get around that, you'd have to make a couple of matches & replaces, because you'd need to look for quotes/ticks around items, however, those items would need to be inside HTML tags. You can't do something like this:

<.*?"(.*?)".*?>

because it wouldn't match things like "http://www.google.com" style="color:orange;" where you have multiple sets of quotes inside the HTML tag. To work around this, I first pulled out the tags and then once I had those, pulled out the content between each set of quotes for evaluation.

$string = preg_replace_callback('~<(.*?)>~i', 

    function($m) {

        $return_var = preg_replace_callback('~\'|".*?\'|"~', function ($r) {$v = preg_replace('~\s*~', '', $r[0]); return $v;}, $m[0]);
        return $return_var;
    }, 

    $string);

print $string;

This is basically two preg_replace_callbacks shoved into one statement. The first tells the function to operate on any tags. The second one tells the functions to operate on anything inside of quotes/ticks within those HTML tags.

Here is a demo so you can compare the different options

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top