Вопрос

I am trying to write a bash script to merge all pdf files of a directory into one single pdf file. The command pdfunite *.pdf output.pdf successfully achieves this but it merges the input documents in a regular order:

1.pdf
10.pdf
11.pdf
2.pdf
3.pdf
4.pdf
5.pdf
6.pdf
7.pdf
8.pdf
9.pdf

while I'd like the documents to be merged in a numerical order:

1.pdf
2.pdf
3.pdf
4.pdf
5.pdf
6.pdf
7.pdf
8.pdf
9.pdf
10.pdf
11.pdf

I guess a command mixing ls -v or sort -n and pdfunite would do the trick but I don't know how to combine them. Any idea on how I could merge pdf files with a numerical sort?

Это было полезно?

Решение

you can embed the result of command using $(), so you can do following

$ pdfunite $(ls -v *.pdf) output.pdf

or

$ pdfunite $(ls *.pdf | sort -n) output.pdf

However, note that this does not work when filename contains special character such as whitespace.

In the case you can do the following:

ls -v *.txt | bash -c 'IFS=$'"'"'\n'"'"' read -d "" -ra x;pdfunite "${x[@]}" output.pdf'

Although it seems a little bit complicated, its just combination of

Note that you cannot use xargs since pdfunite requires input pdf's as the middle of arguments. I avoided using readarray since it is not supported in older bash version, but you can use it instead of IFS=.. read -ra .. if you have newer bash.

Другие советы

Do it in multiple steps. I am assuming you have files from 1 to 99.

 pdfunite $(find ./ -regex ".*[^0-9][0-9][^0-9].*"  | sort) out1.pdf
 pdfunite out1.pdf $(find ./ -regex ".*[^0-9]1[0-9][^0-9].*"  | sort) out2.pdf
 pdfunite out2.pdf $(find ./ -regex ".*[^0-9]2[0-9][^0-9].*"  | sort) out3.pdf

and so on.

the final file will consist of all your pdfs in numerical order.

!!! Beware of writing the output file such as out1.pdf etc. otherwise pdfunite will overwrite the last file !!!

Edit: Sorry I was missing the [^0-9] in each regex. Corrected it in the above commands.

You can rename your documents i.e. 001.pdf 002.pdf and so on.

destfile=combined.pdf
find . -maxdepth 1 -type f -name '*.pdf' -print0 \
   | sort -z -t '/' -k2n \
   | { cat; printf '%s\0' "$destfile"; } \
   | xargs -0 -x pdfunite
  1. Variable destfile holds the name of the destination pdf file.
  2. The find command finds all the pdf files in the current directory and outputs them as a NUL delimited list.
  3. The sort command reads the NUL delimited list of filenames. It specifies a field delimiter of /. It sorts by the 2nd field numerically. (Recall that the output of find looks like ./11.pdf ....)
  4. We append destfile before sending to xargs, being sure to end it with a NUL.
  5. xargs reads the NUL delimited args and supplies them to the pdfunite command. We supplied the -x option so that xargs will exit if the command length is too long. We don't want xargs to execute a partially constructed command.

This solution handles filenames with embedded newlines and spaces.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top