Bash script pdftk merge PDFs

https://stackoverflow.com/questions/16652518

bash
pdftk

30-05-2022
|

Question

I have a few thousand PDFs that I need merged based on filename.

Named like:

Lastname, Firstname_12345.pdf

Instead of overwriting or appending, our software appends a number/datetime to the pdf if there are additional pages like:

Lastname, Firstname_12345_201305160953344627.pdf

For all the ones that don't have a second (or third) pdf the script doesn't need to touch. But, for all the ones that have multiples, they need to be merged into a new file *_merged.pdf? and the originals deleted.

I gave this my best effort and this is what I have so far.

#! /bin/bash

# list all pdfs to show shortest name first
LIST=$(ls -r *.pdf)
for x in "$LIST"

# Remove .pdf extension. merge pdfs. delete originals.
do
    y=${x%%.*}
    pdftk "$y"*.pdf cat output "$y"_merged.pdf
    find "$y"*.pdf -type f ! -iname "*_merged.pdf" -delete
done

This script works to a certain extent. It will merge and delete the originals, but it doesn't have anything in it to skip ones that don't need anything appended to them, and when I run it in a folder with several test files it stops after one file. Can anyone point me in the right direction?

La solution

Since your file names contain spaces the for loop won't work as is.

Once you have a list of file names, a test on the number of files matching y*.pdf to determine if you need to merge the pdfs.

    #!/bin/bash

    LIST=( * )

    # Remove .pdf extension. merge pdfs. delete originals.
    for x in "${LIST[@]}" ; do
        y=${x%%.pdf}
        if [ $(ls "$y"*.pdf 2>/dev/null | wc -l ) -gt 1 ]; then
            pdftk "$y"*.pdf cat output "$y"_merged.pdf
            find "$y"*.pdf -type f ! -iname "*_merged.pdf" -delete
        fi
    done

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow