Question

Most PDF files found on the Web have compressed and unreadable data streams. Is it possible to uncompress the internal content of a PDF file using Acrobat or Acrobat Distiller, allowing us to read the source code by a text editor?

P.S. This question is inspired by this answer which explains how it can be done with GhostScript.

Was it helpful?

Solution 3

This is easy with qpdf and pdftk.

With Adobe Acrobat you can get at the internal structure after profiling a PDF (preflight with some profile (e.g. detect PDF syntax errors), then Options->Internal PDF structure) - but there's no way to get something editable with a text editor.

OTHER TIPS

qpdf and pdftk have already been mentioned. To show the commands:

$ qpdf --qdf --object-streams=disable orig.pdf uncompressed-orig.pdf
$ pdftk orig.pdf output uncompressed-orig.pdf uncompress

mutool however hasn't been mentioned yet:

$ mutool clean -d -a orig.pdf uncompressed-orig.pdf

mutool is a command line tool which ships alongside the lightweight MuPDF PDF + document viewer.

I do not think you can achieve the uncompressing of PDF objects' streams with Acrobat or Distiller (unless you have additional payware plugins available).

Use cpdf:

cpdf -decompress in.pdf -o out.pdf

and then the graphic operators for each page can be read in a text editor. You'll need a copy of the standard as a reference, though.

Disclosure: I am the author of cpdf.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top