Pergunta

I'm using QueryPath to manipulate a pages DOM. The page I'm manipulating has some tags that QueryPath doesn't know how to interpret.

I've tried passing the following as options but I still get errors:

ignore_parser_warnings
use_parser (html)

I get the following errors with these enabled:

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Tag nobr invalid in Entity

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity

Any help would be greatly appreciated.

Foi útil?

Solução

Try the libxml functions

libxml_use_internal_errors(TRUE);
$dom->load('whatever'); // or whatever you use for loading the DOM
libxml_clear_errors();

Instead of just clearing the erros, you can opt to handle them, though the above should be sufficient for most cases.

Outras dicas

Use htmlqp() instead of qp(). The htmlqp() function does a substantial amount of fixing for yucky HTML.

Just use an @ in front of your QueryPath functions to suppress the warnings. While invalid HTML may generate warnings, it can generally handle it just fine.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top