You are using the wrong method to load the XML.
Use load
to load an XML file or loadXML to load an XML string. Use loadHTMLFile to load an HTML file and loadHTML to load HTML content.
Using one of the HTML methods will trigger libxml's HTML parser module, which is
an HTML 4.0 non-verifying parser with API compatible with the XML parser ones. It should be able to parse "real world" HTML, even if severely broken from a specification point of view.
The HTML Parser module will always use HTML4 Transitional as the DTD, as well as parsing the document with lenient error handling and attempting to auto correct things, for instance, by adding an HTML skeleton to partials, etc.