PDF Tables of Arbitrary (within reason) Width

https://stackoverflow.com/questions/832693

08-07-2019
|

Question

I know PDF generation has been discussed a lot here; however, I've yet to find what I need.

I'm trying to generate PDF reports (mainly tables) from python. Yes I've tried ReportLab and Pisa. Both had column content "break out" in circumstances I didn't think were unreasonable and unrealistic to encounter in production.

When I say reasonable I mean 8 - 12 columns of differing widths. Not 80 - 1200 or some such.

I don't need a native python solution as I will be able to have my script launch off the linux command line.

I have these reports working in XHTML and they look more or less perfect ... I'd prefer to leverage them.

What I'm asking is: does anyone know of a tool I can use that will render tables of arbitrary (again within reason) size in PDF with quality near XHTML browser rendering?

I'd like to use something like PrinceXML; however the size of this project doesn't justify the expense of such a tool.

As an aside I have tried to do what I need in Latex , something I'm not apposed to but if that is a good idea I'd appreciate an example.

Regards, and thanks in advance.

Solution 3

The stand alone program : wkhtmltopdf is exactly what I needed. The PDF rendering of XHTML is the best of seen from a free tool.

OTHER TIPS

I completely agree with Brandon Craig Rhodes answer. TeX, plain or with a macro package like LaTeX or ConTeXt, would be a good solution if you need high quality output. However TeX is a heavy dependency

If you are looking for a lighter alternative you can try to

generate xsl-fo and render it with apache-fop, or
write a Python wrapper around iText.

Both can do arbitrary width tables with borders. xsl-fo is not too difficult to learn and if you are used to XML easier to generate than LaTeX code.

iText is a powerful PDF library available under MPL and LGPL There are versions written in Java and C# but unfortunately there is none in Python yet.

Using TeX might give you good results. I would be tempted to avoid LaTeX, myself, but that's because it's a really complicated macro package and I've never really understood it when I've tried to use it; plus, at least given my tastes back then, it seemed a very verbose way to mark up my text compared with what I was used to using in plain TeX.

The real trick will be coming up with a way to escape all of the special characters your data might include so that the TeX source file you create won't error out because you, say, use an ampersand somewhere and TeX interprets it as an out-of-place command. It would take sitting down with the TeXBook for half an hour, probably, for me to get a quoting function working perfectly.

But if your data is just normal strings, then we can try printing a table without it. Here's an example:

#!/usr/bin/env python

import os

# Create a 2x3 PDF table of items, using TeX.

format = r"# \hfil & \hfil #"
data = [['Hydrogen', 1],
        ['Silicon', 14],
        ['Mercury', 80]]

table_data = r'\cr '.join('&'.join(str(i) for i in row) for row in data)

f = open('table.tex', 'w')
f.write(r"\halign{" + format + r"\cr " + table_data + r"\cr}\end")
f.close()

os.system("tex table.tex")
os.system("dvipdf table.dvi")

The big problem, as you can see from the PDF this products (if you'll run it and take a look), is that the table has no borders, and, if you'll take a look at the TeXBook, you'll find that producing them — while always possible — is not the most natural or obvious of operations.

Come to think of it, maybe LaTeX would have some use, if it had macros to make tables with borders easy to create after all. :-)

Have you, by the way, just looked to see if WebKit or any of the other browser backends can be made to produce PDFs directly from HTML, from the command line? They produce PDFs somehow for printing; there must be a way to take advantage of that to turn your HTML into PDF directly.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow