Pergunta

I use the sgml library of prolog to extract information about a web page. I use this instruction to extract all:

load_structure('file.html', List, [dialect(sgml), shorttag(false), max_errors(-1)])

the system loads the page but i have some warnings, for instance:

WARNING:SGML2PL(sgml): inserted omitted end-tag for "img"
WARNING:SGML2PL(sgml): inserted omitted end-tag for "br"
WARNING:SGML2PL(sgml): entity "amp" does not exist

How can i do to eliminate this warnings?

Foi útil?

Solução

I use this syntax

get_html_file(FileOrStream, P) :-
        dtd(html, DTD),
        load_structure(FileOrStream, [P],
                       [ dtd(DTD),
                         dialect(sgml),
                         shorttag(false),
                         syntax_errors(quiet),
                         max_errors(-1)
                       ]).

the option syntax_errors(quiet) should do.

I recall I had some hard time parsing old pages with errors. Error handling can be complicated, some tool like tags soup, being more tolerant, could help in getting the work sone...

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top