How to extract title and meta description using PHP Simple HTML DOM Parser?

https://stackoverflow.com/questions/11385774

19-06-2021
|

Question

How can I extract a page's title and meta description using the PHP Simple HTML DOM Parser?

I just need the title of the page and the keywords in plain text.

Solution

I just took a look at the HTML DOM Parser, try:

$html = new simple_html_dom();
$html->load_file('xxx'); //put url or filename in place of xxx
$title = $html->find('title');
echo $title->plaintext;

$descr = $html->find('meta[description]');
echo $descr->plaintext;

OTHER TIPS

$html = new simple_html_dom();
$html->load_file('some_url'); 

//To get Meta Title
$meta_title = $html->find("meta[name='title']", 0)->content;

//To get Meta Description
$meta_description = $html->find("meta[name='description']", 0)->content;

//To get Meta Keywords
$meta_keywords = $html->find("meta[name='keywords']", 0)->content;

NOTE: The names of meta tags are casesensitive!

$html = new simple_html_dom();
$html->load_file('http://www.google.com'); 
$title = $html->find('title',0)->innertext;

$html->find('title') will return an array

so you should use $html->find('title',0), so does meta[description]

Taken from LeiXC's solution above, you need to use the simple html dom class:

$dom = new simple_html_dom();
$dom->load_file( 'websiteurl.com' );// put your own url in here for testing
$html = str_get_html($dom);
$descr = $html->find("meta[name=description]", 0);
$description = $descr->content;
echo $description;

I have tested this code and yes it is case sensitive (some meta tags use a capital D for description)

Here is some error checking for spelling errors:

if( is_object( $html->find("meta[name=description]", 0)) ){
    echo $html->find("meta[name=description]", 0)->content;
} elseif( is_object( $html->find("meta[name=Description]", 0)) ){
    echo $html->find("meta[name=Description]", 0)->content;
}

$html->find('meta[name=keywords]',0)->attr['content'];
$html->find('meta[name=description]',0)->attr['content'];

$html = new simple_html_dom();
$html->load_file('xxx'); 
//put url or filename in place of xxx
$title = array_shift($html->find('title'))->innertext;
echo $title;
$descr = array_shift($html->find("meta[name='description']"))->content;
echo $descr;

you can using php code and so simple to know. like here

$result = 'site.com'; $tags = get_meta_tags("html/".$result);

The correct answer is:

$html = str_get_html($html);
$descr = $html->find("meta[name=description]", 0);
$description = $descr->content;

The above code gets html into an object format, then the find method looks for a meta tag with the name description, and finally you need to return the value of the meta tag's content, not the innertext or plaintext as outlined by others.

This has been tested and used in live code. Best

I found the easy way to take description

$html = new simple_html_dom(); 
$html->load_file('your_url');
$title = $html->load('title')->simpletext; //<title>**Text from here**</title>
$description = $html->load("meta[name='description']", 0)->simpletext; //<meta name="description" content="**Text from here**">

If your line contains extra spaces, then try this

$title = trim($title);
$description = trim($description);

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow