xhtml support in pisa v3.0.33

https://stackoverflow.com/questions/19339874

30-06-2022
|

Question

I am trying to convert html to pdf using pisa. I am using the following line of code -

pisa.CreatePDF(htmlCode, pdfFile, xhtml=True )

I get the following error. pdf creation failed with error 'module' object has no attribute 'XHTMLParser'

I have html5lib 1.0b3 installed. It used to work before but something happened (may be I updated some of the modules). So does any one know why I keep getting the above error?

When I do not pass the "xhtml=True", the call succeeds but the pdf generated is all wrong. Can I get around this somehow? Is it possible to convert a web page from xhtml to html?

How do I know whether a particular page is in xhtml or not?

The last two questions might not make sense because I do not write html code and can only read it.

Thanks for any help.

Solution

There is no XHTMLParser in html5parser, and the source code of pisa indicates that the xhtml=True flag is permanently broken:

if xhtml:
    #TODO: XHTMLParser doesn't see to exist...
    parser = html5lib.XHTMLParser(tree=treebuilders.getTreeBuilder("dom"))

Fortunately, XHTML is often valid HTML as well, so you don't need any conversion. Therefore, simply find out why the pdf generated is all wrong - XHTML is not the problem here.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow