For some reason, the pubmed server is returning that entire XML file as an HTML file with a single <pre>
tag containing the XML. It also contains multiple XML fragments (there's several <PubmedArticle>
elements and no container around them). Clearly this is intended to be processed by some wacky custom code.
You could "unwrap" the XML by calling SimpleXML twice, like so:
$outer_xml = simplexml_load_file($local);
$inner_xml = simplexml_load_string('<dummyContainer>' . (string)$outer_xml . '</dummyContainer>');
foreach ( $inner_xml->PubmedArticle as $article )
{
// etc
}
To explain:
- the outer "XML document" is the HTML, which has a single outer element of
<pre>
- casting that to string (which I've done explicitly with
(string)
for clarity and good habit) will give you the contents of that<pre>
tag, i.e. all the<PubmedArticle>
elements - wrapping that content in a
<dummyElement>
tag will give you a valid XML document, with each of the<PubmedArticle>
elements as a top-level child in the document