Question

I am parsing web page data using file_get_content. Now I want to extact first 150 character as a description for that url.

                $url = 'http://crewow.com/CSS_Layout_Tutorial.php';
                $data = file_get_contents($url);
                $content = plaintext($data);
                $Preview = trim_display(140,$content); //to Show first 100 char of the web page as preview
                echo $Preview;

    function trim_display($size,$string)

        {

            echo "string is  : $string <br/>";

            $trim_string = substr($string, 0, 150);

            $trim_string = $trim_string . "...";
            echo "Trim string is  $trim_string <br/>";
            return $trim_string;
           }

function plaintext($html)
{
    $plaintext = preg_replace('#([<]title)(.*)([<]/title[>])#s', ' ', $html);
    // remove title 

        //$plaintext = preg_match('#<title>(.*?)</title>#', $html);
    // remove comments and any content found in the the comment area (strip_tags only removes the actual tags).
    $plaintext = preg_replace('#<!--.*?-->#s', '', $plaintext);

    // put a space between list items (strip_tags just removes the tags).
        $plaintext = preg_replace('#</li>#', ' </li>', $plaintext);     

        // remove all script and style tags
    $plaintext = preg_replace('#<(script|style)\b[^>]*>(.*?)</(script|style)>#is', "", $plaintext);
    // remove br tags (missed by strip_tags)
        // remove all remaining html
        $plaintext = strip_tags($plaintext);
    return $plaintext;

}

This code works well for some url. For few it does not show any thing in $Preview. Data sent to trim_display() correctly but fails $trim_string = substr($string, 0, 150);.

Output of this remail empty.

Was it helpful?

Solution

Actually user code is correct and working also correct. But unfortunately is not returning any character with 150 character. Try 5000 .

$trim_string = substr($string, 0, 5000);

To understand this problem see view source.

You can use this code instead of your and definitely it will work:

$url = 'http://crewow.com/CSS_Layout_Tutorial.php';
 $data = file_get_contents($url);
 $content = plaintext($data);
 //echo trim($content);
 $Preview = trim_display(150,trim($content)); //to Show first 100 char of the web page as preview
 echo $Preview;

 function trim_display($size,$string)
 {

            //echo "string is  : $string <br/>";

            $trim_string = substr($string, 0, 150);

            $trim_string = $trim_string . "...";
            //echo "Trim string is  $trim_string <br/>";
            return $trim_string;
 }

function plaintext($html)
{
    $plaintext = preg_replace('#([<]title)(.*)([<]/title[>])#s', ' ', $html);
    // remove title 

        //$plaintext = preg_match('#<title>(.*?)</title>#', $html);
    // remove comments and any content found in the the comment area (strip_tags only removes the actual tags).
    $plaintext = preg_replace('#<!--.*?-->#s', '', $plaintext);

    // put a space between list items (strip_tags just removes the tags).
        $plaintext = preg_replace('#</li>#', ' </li>', $plaintext);     

        // remove all script and style tags
    $plaintext = preg_replace('#<(script|style)\b[^>]*>(.*?)</(script|style)>#is', "", $plaintext);
    // remove br tags (missed by strip_tags)
        // remove all remaining html
        $plaintext = strip_tags($plaintext);
    return $plaintext;

}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top