Photo extraction from pdf file

https://stackoverflow.com/questions/456415

19-08-2019
|

Question

Does anyone know of a way i can extract all jpg images from a pdf file? I am currently using Acrobat and i have a file that contains about 1500 photos that i need to extract but doing them one at a time would be much too time consuming. Any ideas?

Thanks.

Solution

just doing a little search i found this, i hope it helps... i cant think of any reason there'd be 1500 images in a pdf.

http://pdf-image-extraction-wizard.lastdownload.com/

OTHER TIPS

There are free utilities that can help you do this. For example, a quick Google search turned up this one.

On a Mac try the app FileJuicer - this normally works really well at extracting images from PDFs

Coding answer (requires tesseract (free software)). I'm not sure which of the packages I actually used for that bit of code, some packages are there for other functions in the same code block.

from PIL import Image
import pytesseract
import cv2
import os
import subprocess

#Strip images and put them in the relevant directory
def image_exporter(pdf_path, output_dir):
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    cmd = ['pdfimages', '-all', pdf_path,
           '{}/prefix'.format(output_dir)]
    subprocess.call(cmd)
    print('Images extracted:')
    print(os.listdir(output_dir))

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow