Question

I have created a script that combines two PDFs into one side by side, by looking at some of Kurt Pfeifle's answers.

But my problem is that the code isn't flexible. By that I mean if one PDF is larger or has another resolution that the other PDF, the output PDF (side by side PDF) will be bad.

Illustrated it looks like this:

Input file: a.pdf
+--------+ 
|        |
|  a     |
|        |
+--------+

Input file: b.pdf
+--------+ 
|        |
|  b     |
|        |
+--------+

Desired output file: compare.pdf
+--------+--------+ 
|        |        |
|   a    |  b     |
|        |        |
+--------+--------+

So I need to make sure that the PDFs both have the same regular A4 size PDF and resolution before I combine them? I have tried so many codes and scripts, but can't figure this one out. How can I do that? The script needs to be bulletproof so that any PDFs can be used and compared. Even if they haven't got the same size.

My script look like this now and works on some PDFs with the same size and resolution:

gswin64c.exe                        ^
          -o c.pdf                  ^
          -sDEVICE=pdfwrite         ^
          -g11690x8270              ^
          -dFIXEDMEDIA              ^
          -dPDFSETTINGS=/prepress   ^
          -r300                     ^
          -c "<</PageOffset [0 0]>>setpagedevice" ^
          -f a.pdf

This creates c.pdf, looking like this:

c.pdf
+--------+--------+ 
|        |        |
|   a    | (empty)|
|        |        |
+--------+--------+

Next command:

gswin64c.exe                       ^
          -o left-side-outputs.pdf ^
          -sDEVICE=pdfwrite        ^
          -g11690x8270             ^
          -dPDFSETTINGS=/prepress  ^
          -c "<</PageOffset [0 0]>>setpagedevice" ^
          -f b.pdf

This creates left-side-outputs.pdf, looking like this:

left-side-outputs.pdf
+--------+--------+ 
|        |        |
|   b    | (empty)|
|        |        |
+--------+--------+

Next command:

gswin64c.exe                        ^
          -o right-side-outputs.pdf ^
          -sDEVICE=pdfwrite         ^
          -g11690x8270              ^
          -dPDFSETTINGS=/prepress   ^
          -c "<</PageOffset [596 0]>>setpagedevice" ^
          -f c.pdf

This creates right-side-outputs.pdf, looking like this:

right-side-outputs.pdf
+--------+--------+ 
|        |        |
|(empty) |  b     |
|        |        |
+--------+--------+

Last command:

pdftk left-side-outputs.pdf multistamp right-side-outputs.pdf output compare.pdf

This creates the final result, compare.pdf:

Desired output file: compare.pdf
+--------+--------+ 
|        |        |
|   a    |  b     |
|        |        |
+--------+--------+

I hope some gurus out there can help me figure out how to handle PDF input files with different page sizes.

Was it helpful?

Solution

To your question...

So I need to make sure that the PDFs both have the same regular A4 size PDF and resolution before I combine them?

...the answer is 'Yes, regarding the page size -- No regarding the resolution (doesn't matter).'

Scaling PDF pages with Ghostscript (1)

A command to scale all pages of a mixed-sized PDF to an all-A4 is this:

 gswin64c.exe           ^
     -o all-a4.pdf      ^
     -sDEVICE=pdfwrite  ^
     -g5950x8420        ^
     -dPDFFitPage       ^
     -f input.pdf

This scales media sizes and contents likewise (tested with GS v9.10).

The parameter -dPDFFitPage will always keep the aspect ratio. It will automatically rotate the content to make the best fit. It does not allow 'stretching' or the page into one direction only. This can however be achieved with the next method.


[Update

I think one point about this method I did get across not clearly enough.

The thing is this: if the aspect ration of media from your input file is not already the same as your target media's, then the -dPDFFitPage will not entirely cover your target media.

Assuming your input medium uses a square page size, 500x500 points. If you process this with a target size of A4 (-g5950x8420), then the -dPDFFitPage will keep the square aspece ratio and produce an output size of -g5950x5950 only.

But you cannot leave out -dPDFFitPage either -- otherwise you don't get your original 400x400 content scaled, but only placed on the bigger 595x842 page, placed into the lower left corner.

End of update.]


Scaling PDF pages with Ghostscript (2)

A command to scale all PDF page contents to 50% of both their respective dimensions is this:

 gswin64c.exe                                      ^
     -o 50pc.pdf                                   ^
     -sDEVICE=pdfwrite                             ^
     -c "<</Install {.5 .5 scale}>> setpagedevice" ^
     -f input.pdf

However, this will NOT scale the media boxes at the same time!

If you know that all pages in your PDF file are of the same size, you could use this to scale an A3 PDF to A4:

 gswin64c.exe                                      ^
     -o A4-50pc.pdf                                ^
     -g5950x8420                                   ^
     -sDEVICE=pdfwrite                             ^
     -c "<</Install {.5 .5 scale} /AutoRotatePages /None>> setpagedevice" ^
     -f A3.pdf

However, the first command in my answer will of course also work, and it is more simple to use!

For A5 -> A4 or A4 -> A3 use:

                    {1.415 1.415 scale}

For A3 -> A4 or A4 -> A5:

                    { .707  .707 scale}

But here it gets more interesting now, because you can 'stretch' the contents as well! To scale horizontally to 75% and vertically to 66%, use

     -c "<</Install {.75 .666 scale}>> setpagedevice"

For a kind of 'liquid' scaling between Letter and A4, you may use these:

  • A4 -> Letter: {1.028571 .940617 scale}
  • Letter -> A4: { .972222 1.063131 scale}

For all of the above you can give a -gNNNNxMMMM value (determining a fixed page size for the output PDF -- dimensions in pixels at the default internal resolution of the pdfwrite device, which is 720 ppi, giving for 1 PostScript point 10 pixels...)-

If you do not give a -gNNNNxMMMM value, the original page sizes are used (even if they are of mixed values), but their content will be drawn upon these pages with the scaling factor you specified.

What I do not know right now: A method to 'liquid-scale' each individual page of a mixed sized PDF including the media sizes in one go...

Comparing all-Letter with all-A5 PDF files, based on A4:

Assuming you now want to compare an all-Letter sized PDF to one which is all-A5, and you want to scale both to A4 first, here is what you'd do:

'Liquid'-Scale Letter to A4:

 gswin64c.exe                                      ^
     -o a4-1.pdf                                   ^
     -sDEVICE=pdfwrite                             ^
     -g5950x8420                                   ^
     -c "<</Install{.972222 1.063131 scale}>>setpagedevice" ^
     -f letter.pdf

'Fixed'-Scale A5 to A4:

 gswin64c.exe                                      ^
     -o a4-2.pdf                                   ^
     -sDEVICE=pdfwrite                             ^
     -g5950x8420                                   ^
     -c "<</Install{1.415 1.415 scale}>>setpagedevice" ^
     -f a5.pdf

or, alternatively:

 gswin64c.exe          ^
     -o a4-2.pdf       ^
     -sDEVICE=pdfwrite ^
     -g5950x8420       ^
     -dPDFFitPage      ^
     -f a5.pdf

And now compare both your A4 PDF files....

Optimising your comparison workflow

You can also save one step of the workflow as outlined in your question. Here is a better approach.

First step: prepare left sides (as before)

Assuming you have A4 input, and the final output should be A3:

 gswin64c.exe                   ^
      -o left-sides.pdf         ^
      -sDEVICE=pdfwrite         ^
      -g11900x8420              ^
      -c "<</PageOffset [0 0]>>setpagedevice" ^
      -f a.pdf

This creates:

left-sides.pdf
+--------+--------+   ^
|        |        |   |
|        |        |   |
|  a     |(empty) |  595 pt == 5950 pixels
|        |        |   |
|        |        |   |
+--------+--------+   v

<-----1190 pt----->
   == 11900 pixels

Second step: prepare right sides (all in one go)

 gswin64c.exe                   ^
      -o right-sides.pdf        ^
      -sDEVICE=pdfwrite         ^
      -g11900x8420              ^
      -c "<</PageOffset [595 0]>>setpagedevice" ^
      -f b.pdf

This creates:

right-side.pdf
+--------+--------+   ^
|        |        |   |
|        |        |   |
|(empty) |  b     |  595 pt == 5950 pixels
|        |        |   |
|        |        |   |
+--------+--------+   v

<-----1190 pt----->
   == 11900 pixels

Third step: overlay the two files with pdftk

pdftk right-sides.pdf multistamp left-sides.pdf output compare.pdf

or

pdftk left-sides.pdf multistamp right-sides.pdf output compare2.pdf

This creates:

compare.pdf
+--------+--------+   ^
|        |        |   |
|        |        |   |
|  a     |  b     |  595 pt == 5950 pixels
|        |        |   |
|        |        |   |
+--------+--------+   v

<-----1190 pt----->
   == 11900 pixels

Update regarding Crop-/Trim-/Art-/Bleed-Boxes

One more thing.

Sometimes above commands may not "seem" to work. The reason is, that PDFs do internally not only use the naìvely assumed "page size", but a more complex setup of MediaBox (what we usually regard as "page size"), as well as TrimBox, BleedBox, ArtBox and CropBox. See here for an exact description of these boxes...

To test your PDFs files (inputs as well as results or intermediate results) for all these boxes' values, use the pdfinfo command:

pdfinfo -f 1 -l 5 -box a.pdf
pdfinfo -f 1 -l 5 -box b.pdf
pdfinfo -f 1 -l 5 -box right-sides.pdf
pdfinfo -f 1 -l 5 -box left-sides.pdf
pdfinfo -f 1 -l 5 -box compare.pdf

The CropBox makes PDF viewers (and printers) to only display (or print) that part of the content which is on the MediaBox, if it is defined differently from the MediaBox can get into the way of the rescaling task. It will not be touched by Ghostscript, if it sees one.

It can happen that the file was processed succesfully, but in the viewer it still shows you the same viewport onto the page.

In order to "disarm" the effect of these boxes, you should can use a very crude trick: rename these strings within the PDF to all-lowercase names. Here is how to do it with the sed commandline (may not be available on Windows):

cat input.pdf                    \
   | sed 's#CropBox#cropbox#g'   \
   | sed 's#TrimBox#trimbox#g'   \
   | sed 's#BleedBox#bleedbox#g' \
   | sed 's#ArtBox#artbox#g'     \
> disarmed.pdf

or, somehow shorter, but not as easy to parse:

sed 's#CropB#cropb#g;s#TrimB#trimb#g;s#BleedB#bleedb#g;s#ArtB#artb#g' \
  in.pdf > out.pdf

Since Ghostscript is a binary file format, with some versions of sed you may encounter an error message saying:

sed: RE error: illegal byte sequence

In this case try a different flavor, like GNU sed, gsed...

OTHER TIPS

PDF files don't contain a resolution, so that can't be a problem. I wouldn't normally use -r with Ghostscript either, all that does is specify the resolution at which any content which cannot be emitted 'as is' into the PDF file is rendered at in order to turn it into an image. It doesn't affect the size or placement of that content.

You shouldn't need /PageOffset, I don't think that will have any effect at all (if the input is PDF).

I would NOT use /PDFSETTINGS. By using that you are importing all kinds of canned settings, unless you are confident that these are all exactly what you want you are much better off using the defaults and flipping any switches you want changed individually.

You may very well want to put /AutoRotatePages=/None, because otherwise pdfwrite will try to make the majority of the text run left to write horizontally.

You are converting one of the files twice, you should try to avoid that, the more conversions the more likelihood of problems.

You have specified media sizes on all three Ghostscript inputs, but you haven't specified FIXEDMEDIA On two of them. For one that's probably fine because its a reprocessing of the first one (where you do specify FIXEDMEDIA) but what about the second instance ?

You don't actually say what the problem that you are experiencing is. Nor do you say of the problem exhibits in the individual files, or only when you use pdftk to merge them together. Without that information, and some sample files that demonstrate the problem, its really not possible to give you any more guidance.

Oh and in passing you could actually do n-up imposition like this with Ghostscript directly, though you;d have to do more work than you do using pdftk. With a little effort I could probably do the whole thing in one Ghostscript invocation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top