Question

I have string such as

<p>
    <style type="text/css">
P { margin-bottom: 0.21cm; direction: ltr; color: rgb(0, 0, 0); }P.western { font-family: "Times New Roman",serif; font-size: 12pt; }P.cjk { font-family: "Arial Unicode MS",sans-serif; font-size: 12pt; }P.ctl { font-family: "Tahoma"; font-size: 12pt; }    </style>
</p>
<p align="CENTER" class="western" style="margin-bottom: 0cm">
    <font size="5" style="font-size: 20pt"><u><b> TEXT I WANT TO GET </b></u></font></p>

How can i strip html, css and get only text?

Im aware of strip_tags(), and I can write function with preg_replace, but is there a working solution for php? Thanks.

Was it helpful?

Solution

Use:

<?php

$text = '<p>
    <style type="text/css">
P { margin-bottom: 0.21cm; direction: ltr; color: rgb(0, 0, 0); }P.western { font-family: "Times New Roman",serif; font-size: 12pt; }P.cjk { font-family: "Arial Unicode MS",sans-serif; font-size: 12pt; }P.ctl { font-family: "Tahoma"; font-size: 12pt; }    </style>
</p>
<p align="CENTER" class="western" style="margin-bottom: 0cm">
    <font size="5" style="font-size: 20pt"><u><b> TEXT I WANT TO GET </b></u></font></p>';

$text = strip_tags($text,"<style>");

$substring = substr($text,strpos($text,"<style"),strpos($text,"</style>")+2);

$text = str_replace($substring,"",$text);
$text = str_replace(array("\t","\r","\n"),"",$text);
$text = trim($text);

echo $text;

?>

OTHER TIPS

To make sure that occurrences of style tag are removed and all whitespace reduced to a single space:

$text = '<p>
<style type="text/css">
P { margin-bottom: 0.21cm; direction: ltr; color: rgb(0, 0, 0); }P.western { font- 
family: "Times New Roman",serif; font-size: 12pt; }P.cjk { font-family: "Arial Unicode 
MS",sans-serif; font-size: 12pt; }P.ctl { font-family: "Tahoma"; font-size: 12pt; }    
</style>
</p>
<p align="CENTER" class="western" style="margin-bottom: 0cm">
    <font size="5" style="font-size: 20pt"><u><b> TEXT I WANT TO GET </b></u></font> 
</p>
<style>a { color: red}</style>';

$text = strip_tags($text, '<style>');
$start = strpos($text, '<style');
// All of occurrences of <style>.
while ($start !== false) {
    $end = strpos($text, '</style>');
    if (!$text) {
        break;
    }
    $diff = $end - $start + strlen('</style>');
    $substring = substr($text, $start, $diff);
    $text = str_replace($substring, '', $text);
    $start = strpos($text, '<style');
}

// Remaining <style> if any.
$text = strip_tags($text);

// Remove all new lines and tabs and use a space instead.
$text = str_replace(["\n", "\r", "\t"], ' ', $text);

// Trim left and right.
$text = trim($text);

// Remove all spaces that have more than one occurrence.
$text = preg_replace('!\s+!', ' ', $text);

echo $text;

Here's the output.

This function is modified to keep line breaks.

function strip($text,$keepLines=true)
{
    if($keepLines) $text=str_replace(array('</p>','<br/>','<br>'),array("</p> \n","<br/> \n","<br> \n"),$text);
    $text = strip_tags($text,"<style>");

    if(strpos($text,"<style")!==false && strpos($text,"</style>")!==false)
    {
        $substring = substr($text,strpos($text,"<style"),strpos($text,"</style>")+8);
        $text = str_replace($substring,'',$text);
    }

    if(!$keepLines)
    {
        $text = str_replace(array("\t","\r","\n"),"",$text);
        $text = preg_replace('/\s+/',' ',$text);
    }
    else
    {
        $text = str_replace('  ',' ',$text);
    }   
    return trim($text);
}

This worked for me .

function strip_tag_css($text){


    $text = strip_tags($text,"<style>");

    $substring = substr($text,strpos($text,"<style"),strpos($text,"</style>")+2);

    $text = str_replace($substring,"",$text);
    $text = str_replace(array("\t","\r","\n"),"",$text);
    $text = trim($text);

    return $text;
}


$bodymensage = str_replace('  ','',html_entity_decode( strip_tag_css(strip_tags($message)), ENT_QUOTES, "utf8" ));
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top