Extract doctype with simple_html_dom

https://stackoverflow.com/questions/1566028

21-09-2019
|

문제

I am using simple_html_dom to parse a website. Is there a way to extract the doctype?

해결책

You can use file_get_contents function to get all HTML data from website. For example

<?php
   $html = file_get_contents("http://google.com");
   $html = str_replace("\n","",$html);
   $get_doctype = preg_match_all("/(<!DOCTYPE.+\">)<html/i",$html,$matches);
   $doctype = $matches[1][0];
?>

다른 팁

You can use $html->find('unknown'). This works - at least - in version 1.11 of the simplehtmldom library. I use it as follows:

function get_doctype($doc)
{
    $els = $doc->find('unknown');

    foreach ($els as $e => $el) 
        if ($el->parent()->tag == 'root') 
            return $el;

    return NULL;
}

That's just to handle any other 'unknown' elements which might be found; I'm assuming the first will be the doctype. You can explicitly inspect ->innertext if you want to ensure it starts with '!DOCTYPE ', though.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow