Question

I use Pisa/xhtml2pdf in my Django apps to generate pdf from an HTML source. That is:

  1. I generate the HTML file formatted with all 'printing' stuffs (e.g. page-breaks, header, footer, etc.)
  2. I convert this HTML into pdf using Pisa

This process is ok but it is slow (expecially when dealing with long tables) and I must use HTML/CSS according to Pisa features/limitations.

The question is: is this the right way to generate pdf from a web application (i.e. create HTML and then convert it to pdf) or there is a more direct way, that is "write" the pdf with a more suitable language?

Was it helpful?

Solution

WeasyPrint author here. The point of using HTML/CSS to generate PDF (vs. using a lower-level PDF library directly.) is to get automatic layout. It lets you specify high-level constraints like h1 { page-break-after: avoid } and let the layout engine figure it out, rather than specifying the absolute position of everything. The former is much more maintainable when you make changes to your documents.

Some tools like rst2pdf have their own stylesheet syntax, but that’s just a bad way of re-inventing CSS.

But yes, dumping complex stylesheets made for screen might not give great results. It’s better to build the stylesheets with print in mind, or even use completely different stylesheets with @media print in CSS or <link media="print"> in HTML.

OTHER TIPS

I think generating a pdf from html with libraries like Pisa or http://weasyprint.org/ is the simplest approach. because it takes care of inserting images, css, barcode (on pisa) ... etc

If you want to write the pdf yourself take a look at Reportlab but it will take much longer to implement. In both cases i suggest to always generate the pdf in the background with celery or python-rq for optimization.

Pisa is known having various issues - especially with long tables. In general one should avoid using PISA. Other options are:

  • using Reportlab directly
  • z3c.rml (Reportlab template language clone)
  • commercial alternatives:
    • PrinceXML
    • PDFreactor

The general rule when it comes to PDF production: you get what you pay for.

Converters like Pisa or Apache FOP are half-baked solutions that work for simple cases but suck in general.

You can also use the QT webkit rendering engine to create PDFs from HTML with http://code.google.com/p/wkhtmltopdf/ and django-wkhtmltopdf.

The advantage is that you can write the HTML and CSS as you would normally for WebKit. This works well if you are outputting an existing web page but may be less appropriate if generating PDFs from scratch.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top