html5lib returns <None>

Pergunta

In the first step of html5lib tutorial I see pretty confused behavior.

The docs tells:

import html5lib
f = open("mydocument.html")
doc = html5lib.parse(f)

This will return a tree in a custom "simpletree" format.

As file I have a normal html document. But in my case this is:

<None>
>>> doc is None
False

I believe it is not ok, but I have no idea what happens.

edit

If I calls read method on opened file it is returns file as string:

f = open("mydocument.html")
f.read()
# returns string with html

And after doc = html5lib.parse(f), f.read() returns empty string, like the file the file was already read.

Solução

the <None> doesn't really mean that your document is not parsed, it just means that you document has no name. if you do
```
doc.name = "test"
print(doc)
```
it should show <test>
parse can also take a string as argument, in which case it will load the file for you, no need to open it yourself.
try print(doc.toxml())

Não afiliado a StackOverflow