Question

I am doing a bulk generation of pdf files based on templates and I ran into big performance issues pretty fast. My current scenario is as follows:

  1. get data to be filled from db
  2. create fdf based on single data row and pdf form
  3. write .fdf file to disk
  4. merge the pdf with fdf using pdftk (fill_form with flatten command)
  5. continue iterating over rows until all .pdf's are generated
  6. all the generated files are merged together in the end and the single pdf is given to the client

I use passthru to give the raw output to the client (saves time writing file), but this is just a little performance improvements. The total operation time is about 50 seconds for 200 records and I would like to get down to at least 10 seconds in some way.

The ideal scenario would be operating all these pdfs in memory and not writing every single one of them to separate file but then the output would be impossible to do as I can't pass that kind of data to external tool like pdftk. One other idea was to generate one big .fdf file with all those rows, but it looks like that is not allowed.

Am I missing something very trivial here?

I'm thanksfull for any advice.

PS. I know I could use some good library like pdflib but I am considering only open licensed libraries now.

EDIT:

I am up to figuring out the syntax to build an .fdf file with multiple pages using the same pdf as a template, spent few hours and couldn't find any good documentation.

Was it helpful?

Solution

After beeing faced with the same problem for a long time (wanted to generate my pdfs based on LaTeX) i finally decided to switch to another crude but effective technique:

i generate my pdfs in two steps: first i generate html with a template engine like twig or smarty. second i use mpdf to generate pdfs out of it. I tryed many other html2pdf frameworks and ended up using mpdf, it's very mature and is developed since a long time (frequent updates, rich functionality). the benefit using this technique: you can use css to design your documents (mpdf completely features css) - which comes along with the css benefit (http://www.csszengarden.com) and generate dynamic tables very easy.

Mpdf parses the html tables and looks for the theader, tfooter element and puts it on each page if your tables are bigger than one page size. Also you have the possibility to define page header and page footer elements with dynamic entities like page nr and so on.

i know, using this detour seems to be a workaround, but to be honest, no latex, pdf whatever engine is as strong and simple as html!

OTHER TIPS

Try a different less complex library like fpdf (http://www.fpdf.org/)

I find it quite good and lite.

Always find libraries that are small and only do what you need them to do.

The bigger the library the more resources it consumes.

This won't help your multiple-page problem, but I notice that pdftk accepts the - character to mean 'read from standard input'.

You may be able to send the .fdf to the pdftk process via it's stdin, in order to avoid having to write them to disk.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top