I solved the issue by simply filtering the html before creating an xpath instance with the following code:
$html = str_replace("\0", "", $html);
Frage
I noticed that on this url: http://www.bubbleroom.se/sv/kläder/kvinna/controlbody/bodys/body-nero there is a null character \u0
in the tag with id prodText
.
The whole document seems to end by this null char when attempting to extract anything else after this character.
The code that "doesn't" work. It works, but not when there's a null char in the $html
string
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();
return new DOMXPath($dom);
Lösung
I solved the issue by simply filtering the html before creating an xpath instance with the following code:
$html = str_replace("\0", "", $html);