Pergunta

In the first step of html5lib tutorial I see pretty confused behavior.

The docs tells:

import html5lib
f = open("mydocument.html")
doc = html5lib.parse(f)

This will return a tree in a custom "simpletree" format.

As file I have a normal html document. But in my case this is:

<None>
>>> doc is None
False

I believe it is not ok, but I have no idea what happens.

edit

If I calls read method on opened file it is returns file as string:

f = open("mydocument.html")
f.read()
# returns string with html

And after doc = html5lib.parse(f), f.read() returns empty string, like the file the file was already read.

Foi útil?

Solução

  • the <None> doesn't really mean that your document is not parsed, it just means that you document has no name. if you do

    doc.name = "test"
    print(doc)
    

    it should show <test>

  • parse can also take a string as argument, in which case it will load the file for you, no need to open it yourself.

  • try print(doc.toxml())

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top