문제

I want to delete / remove all the images in a PDF leaving only the text / font in the PDF with whatever command Line tool possible.

I tried using -dGraphicsAlphaBits=1 in a Ghostscript command but the images are present but like a big pixel.

도움이 되었습니까?

해결책 3

No, AFAIK, it's not possible to remove all images in a PDF with a commandline tool.

What's the purpose of your request anyway? Save on filesize? Remove information contained in images? Or ...?

Workaround

Whatever you aim at, here is a command that will downsample all images to a resolution of 2 ppi (Update: 1 ppi doesn't work). Which achieves two goals at once:

  • reduce filesize
  • make all images basically un-comprehendable

Here's how to do it selectively, for only the images on page 33 of original.pdf:

gs                               \
  -o images-uncomprehendable.pdf \
  -sDEVICE=pdfwrite              \
  -dDownsampleColorImages=true   \
  -dDownsampleGrayImages=true    \
  -dDownsampleMonoImages=true    \
  -dColorImageResolution=2       \
  -dGrayImageResolution=2        \
  -dMonoImageResolution=2        \
  -dFirstPage=33                 \
  -dLastPage=33                  \
   original.pdf

If you want to do it for all images on all pages, just skip the -dFirstPage and -dLastPage parameters.

If you want to remove all color information from images, convert them to Grayscale in the same command (search other answers on Stackoverflow where details for this are discussed).


Update: Originally, I had proposed to use a resolution of 1 PPI. It seems this doesn't work with Ghostscript. I now tested with 2 PPI. This works.


Update 2: See also the following (new) question with the answer:

It provides some sample PostScript code which completely removes all (raster) images from the PDF, leaving the rest of the page layout unchanged.

It also reflects the expanded new capabilities of Ghostscript which can now selectively remove either all text, or all raster images, or all vector objects from a PDF, or any combination of these 3 types.

다른 팁

You can use the draft option of cpdf:

cpdf -draft in.pdf -o out.pdf

This should work in most situations, but file a bug report if it doesn't do the right thing for you.

Disclosure: I am the author of cpdf.

Time has passed, and development of Ghostscript has progressed...

The latest releases have the following new command line parameters. These can be added to the command line:

  1. -dFILTERIMAGE: produces an output where all raster drawings are removed.

  2. -dFILTERTEXT: produces an output where all text elements are removed.

  3. -dFILTERVECTOR: produces an output where all vector drawings are removed.

Any two of these options can be combined.

Example command:

gs -o noimage.pdf -sDEVICE=pdfwrite -dFILTERIMAGE input.pdf

More details (including some illustrative screenshots) can be found in my answer to "How can I remove all images from a PDF?".

To separate images and text to different layers, unfortunately there is no Free/Open Source Software utility available. Also not a free-as-in-beer one either...

This task can only be achieved with various payware software solutions. Since you didn't exclude this in your question, but you asked for 'whatever commandline tool possible', I'll tell you my favorite one:

A version for CLI usage (which includes a powerful SDK enabling lots of low-level PDF manipulations) is available, and this is supported on all major OS platforms, including Linux.

callas offers you a fully featured gratis test license which is enabled for (I believe) 14 days.

 gs -o noImages.pdf   -sDEVICE=pdfwrite -dFILTERIMAGE                input.pdf
 gs -o noText.pdf   -sDEVICE=pdfwrite -dFILTERTEXT                 input.pdf
 gs -o noVectors.pdf   -sDEVICE=pdfwrite -dFILTERVECTOR               input.pdf
 gs -o onlyImages.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERTEXT  input.pdf
 gs -o onlyText.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERIMAGE input.pdf
 gs -o onlyVectors.pdf -sDEVICE=pdfwrite -dFILTERIMAGE  -dFILTERTEXT  input.pdf
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top