Question

I'm working on an application which generates multi-page (sometimes hundreds or thousands of pages) PDF documents for printing. Each page consists of a generic template with some page-specific content superimposed (think: automatically filling in the "name" field of a paper form).

The problem, though, is that the template is fairly large (about 100kb/page), and duplicating it across every page yields very large PDF files (currently the PDF is generated by using rsvg-convert to convert a directory full of SVG files into a PDF).

Is it possible to reduce the duplication by referencing the static template so that each PDF page only contains the custom content?

Ideally I'd like to know how to do this with Python or Ghostscript, but any starting points would be appreciated.

Was it helpful?

Solution

What you want are Form XObjects inside PDF files. From PDF Reference:

A form XObject is a PDF content stream that is a self-contained description of any sequence of graphics objects (including path objects, text objects, and sampled images). A form XObject may be painted multiple times—either on several pages or at several locations on the same page—and produces the same results each time, subject only to the graphics state at the time it is invoked. Not only is this shared definition economical to represent in the PDF file, but under suitable circumstances the PDF consumer application can optimize execution by caching the results of rendering the form XObject for repeated reuse.

Many applications that add e.g. watermarks to PDF pages, add them as Form XObjects automatically. As an example, you can add template content as background to existing multipage PDF that already has page-specific content, using pdftk:

pdftk multipage.pdf background template.pdf output multipage+.pdf

With Ghostscript, you should have template as EPS, then create multi-page PDF with Form XObjects added, then you add page-specific content with some other methods. But, maybe something smart can be implemented to super-impose specific pages to PDF with background using "Ghostscript only". To create "ready to be filled" multipage PDF with template as Form XObject on each page, do something like this:

gs -sDEVICE=pdfwrite -o 100_pages_template.pdf \
-c '[/_objdef {background} /BBox [0 0 595 841] /BP pdfmark 
save /showpage {} def 
0 0 translate       % adjust according to EPS BBox 
(template.eps) run 
restore 
[/EP pdfmark 
1 1 100 { 
  [{background} /SP pdfmark 
  showpage 
} for'

Don't know about Python, I think it's as easy as next example using Perl. Here, too, I create 100 pages PDF with template on each page:

use strict;
use warnings;
use PDF::API2;

my $pdf = PDF::API2->new();
my $tmpl = PDF::API2->open('template.pdf');
my $xo = $pdf->importPageIntoForm($tmpl, 1);
for (1..100) {
    my $page = $pdf->page();
    my $gfx = $page->gfx();
    $gfx->formimage($xo, 0, 0);

   # add page specific content

}
$pdf->saveas('out.pdf');
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top