Wand convert pdf to jpeg and storing pages in file-like objects

https://stackoverflow.com/questions/18821145

28-06-2022
|

Question

I am trying to convert a pdf to jpegs using wand, but when I iterate over the SingleImages in image.sequence and save each image separately. I am saving each image on AWS, with database references, using Django.

image_pdf = Image(blob=pdf_blob)
image_jpeg = image_pdf.convert('jpeg')
for img in image_jpeg.sequence:
    memory_file = SimpleUploadedFile(
        "{}.jpeg".format(img.page_number),
        page.container.make_blob())
    spam = Spam.objects.create(
        page_image=memory_file,
        caption="Spam")

This doesn't work, the page.container is calling the parent Image class, and the first page is written over and over again. How do I get the second frame/page for saveing?

Solution 2

It seems you cannot get per file blobs without messing with c_types. So this is my solution

from path import path  # wrapper for os.path
import re
import tempfile

image_pdf = Image(blob=pdf_blob)
image_jpeg = image_pdf.convert('jpeg')
temp_dir = path(tempfile.mkdtemp())
# set base file name (join)
image_jpeg.save(temp_dir / 'pdf_title.jpeg')
images = temp_dir.files()

sorted_images = sorted(
    images,
    key=lambda img_path: int(re.search(r'\d+', img_path.name).group())
)
for img in sorted_images:
    with open(img, 'rb') as img_fd:
        memory_file = SimpleUploadedFile(
            img.name,
            img_fd.read()
        )
        spam = Spam.objects.create(
            page_image=memory_file,
            caption="Spam Spam",
        )
tempfile.rmtree(tempdir)

Not as clean as doing it all in memory, but it gets it done.

OTHER TIPS

Actually, you can get per-file blobs:

for img in image_jpeg.sequence:
    img_page = Image(image=img)

Then you can work with each img_page variable like a full-fledged image: change format, resize, save, etc.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow