Question

I use the sgml library of prolog to extract information about a web page. I use this instruction to extract all:

load_structure('file.html', List, [dialect(sgml), shorttag(false), max_errors(-1)])

the system loads the page but i have some warnings, for instance:

WARNING:SGML2PL(sgml): inserted omitted end-tag for "img"
WARNING:SGML2PL(sgml): inserted omitted end-tag for "br"
WARNING:SGML2PL(sgml): entity "amp" does not exist

How can i do to eliminate this warnings?

Était-ce utile?

La solution

I use this syntax

get_html_file(FileOrStream, P) :-
        dtd(html, DTD),
        load_structure(FileOrStream, [P],
                       [ dtd(DTD),
                         dialect(sgml),
                         shorttag(false),
                         syntax_errors(quiet),
                         max_errors(-1)
                       ]).

the option syntax_errors(quiet) should do.

I recall I had some hard time parsing old pages with errors. Error handling can be complicated, some tool like tags soup, being more tolerant, could help in getting the work sone...

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top