Why are there two html nodes in a single-node html document?

Question 1

It appears the 1st node is a DOCUMENT_TYPE (type = 10), which seems to be always created, even with no <!DOCTYPE>. I suppose DOMdocument needs it to process the rest of the document.

The second node is your "real" document.

You can see the contents quickly like so:

$text = '<!DOCTYPE html><html><head><title>Demo</title><script>var a=10;</script></head><body>Bla</body></html>';

$dom = new DOMdocument();
$dom->loadHTML($text);

foreach ($dom->childNodes as $node)
{
    echo $node->nodeName;
    echo "<pre>";print_r($node);echo"</pre><br>\n";
    echo "<pre>";print_r(getArray($node));echo"</pre><br>";
    echo "<br>================================<br>";
}

function getArray($node)
{
    $array = false;

    if ($node->hasAttributes())
    {
        foreach ($node->attributes as $attr)
        {
            $array[$attr->nodeName] = $attr->nodeValue;
        }
    }

    if ($node->hasChildNodes())
    {
        if ($node->childNodes->length == 1)
        {
            $array[$node->firstChild->nodeName] = $node->firstChild->nodeValue;
        }
        else
        {
            foreach ($node->childNodes as $childNode)
            {
                if ($childNode->nodeType != XML_TEXT_NODE)
                {
                    $array[$childNode->nodeName][] = getArray($childNode);
                }
            }
        }
    }

    return $array;
}

Question 2

If no doctype is present in loadHTML($input), the latter wraps the $input with it's own doctype, moreover, as you've mentioned yourself, removing all html tags yelds the same tesult.

If you run this code:

$text = '<head><title>Demo</title></head><body>Bla</body>';


$dom = new DOMdocument();
$dom->loadHTML($text);

echo $dom->saveHTML();

The output will be:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><title>Demo</title></head><body>Bla</body></html>

The answer is yes, there will be always two parent html tags, where the second one holds the dom structure;

PS: loadHTML will enclose non-closed tags automatically also. See DOMDocument::loadHTML