Question

I have a few thousand PDFs that I need merged based on filename.

Named like:

Lastname, Firstname_12345.pdf

Instead of overwriting or appending, our software appends a number/datetime to the pdf if there are additional pages like:

Lastname, Firstname_12345_201305160953344627.pdf

For all the ones that don't have a second (or third) pdf the script doesn't need to touch. But, for all the ones that have multiples, they need to be merged into a new file *_merged.pdf? and the originals deleted.

I gave this my best effort and this is what I have so far.

#! /bin/bash

# list all pdfs to show shortest name first
LIST=$(ls -r *.pdf)
for x in "$LIST"

# Remove .pdf extension. merge pdfs. delete originals.
do
    y=${x%%.*}
    pdftk "$y"*.pdf cat output "$y"_merged.pdf
    find "$y"*.pdf -type f ! -iname "*_merged.pdf" -delete
done

This script works to a certain extent. It will merge and delete the originals, but it doesn't have anything in it to skip ones that don't need anything appended to them, and when I run it in a folder with several test files it stops after one file. Can anyone point me in the right direction?

Était-ce utile?

La solution

Since your file names contain spaces the for loop won't work as is.

Once you have a list of file names, a test on the number of files matching y*.pdf to determine if you need to merge the pdfs.

    #!/bin/bash

    LIST=( * )

    # Remove .pdf extension. merge pdfs. delete originals.
    for x in "${LIST[@]}" ; do
        y=${x%%.pdf}
        if [ $(ls "$y"*.pdf 2>/dev/null | wc -l ) -gt 1 ]; then
            pdftk "$y"*.pdf cat output "$y"_merged.pdf
            find "$y"*.pdf -type f ! -iname "*_merged.pdf" -delete
        fi
    done
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top