Python using xhtml2pdf to print webpage into PDF

https://stackoverflow.com/questions/23355661

11-07-2023
|

Question

I am trying to using xhtml2pdf to print webpage into local disk PDF files. There's an example found as below.

It runs and doesn't return error. However it doesn't convert the webpage but only a sentence. in this case, only 'http://www.yahoo.com/' is written into the PDF file.

How can I actually convert the web page into PDF?

from xhtml2pdf import pisa

sourceHtml = 'http://www.yahoo.com/'
outputFilename = "test.pdf"

def convertHtmlToPdf(sourceHtml, outputFilename):
    resultFile = open(outputFilename, "w+b")
    pisaStatus = pisa.CreatePDF(sourceHtml,resultFile)
    resultFile.close()
    return pisaStatus.err

if __name__=="__main__":
    pisa.showLogging()
    convertHtmlToPdf(sourceHtml, outputFilename)

Solution

xhmlt2pdf is not going to work with all the websites, for one, it is not working for yahoo.com. But the reason it is not working here is you are not providing the actual HTML file to pisa but rather providing the URL, you want to fetch the HTML first, for example using urllib2:

url=urllib2.urlopen('http://sheldonbrown.com/web_sample1.html')
srchtml=url.read()
pisa.showLogging()
convertHtmlToPdf(srchtml, outputFilename)

And it will work. That is a very simple sample HTML.

OTHER TIPS

thanks to CT Zhu's help. just putting down the workable one, for reference:

from xhtml2pdf import pisa
import urllib2

url=urllib2.urlopen('http://sheldonbrown.com/web_sample1.html')
sourceHtml=url.read()
pisa.showLogging()

outputFilename = "test555.pdf"

def convertHtmlToPdf(sourceHtml, outputFilename):
    resultFile = open(outputFilename, "w+b")
    pisaStatus = pisa.CreatePDF(sourceHtml,resultFile)
    resultFile.close()
    return pisaStatus.err

if __name__=="__main__":
    pisa.showLogging()
    convertHtmlToPdf(sourceHtml, outputFilename)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow