Question

I'm trying to set up the sorl-thumbnail django app to provide thumbnails of pdf-files for a web site - running on Windows Server 2008 R2 with Appache web server.

I've had sorl-thumbnail functional with the PIL backend for thumbnail generation of jpeg images - which was working fine.

Since PIL cannot read pdf-files I wanted to switch to the graphicsmagick backend. I've installed and tested the graphicsmagick/ghostscript combination. From the command line

gm convert foo.pdf -resize 400x400 bar.jpg

generates the expected jpg thumbnail. It also works for jpg to jpg thumbnail generation.

However, when called from sorl-thumbnail, ghostscript crashes. From django python shell (python manage.py shell) I use the low-level command described in the sorl docs and pass in a FieldFile instance (ff) pointing to foo.pdf and get the following error:

In [8]: im = get_thumbnail(ff, '400x400', quality=95)
**** Warning: stream operator isn't terminated by valid EOL.
**** Warning: stream Length incorrect.
**** Warning:  An error occurred while reading an XREF table.
**** The file has been damaged.  This may have been caused
**** by a problem while converting or transfering the file.
**** Ghostscript will attempt to recover the data.
**** Error:  Trailer is not found.
GPL Ghostscript 9.07: Unrecoverable error, exit code 1

Note that ff is pointing to the same file that converts fine when using gm convert from command line.

I've tried also passing an ImageFieldFile instance (iff) and get the following error:

In [5]: im = get_thumbnail(iff, '400x400', quality=95)
identify.exe: Corrupt JPEG data: 1 extraneous bytes before marker 0xdb `c:\users\thin\appdata\local\temp\tmpxs7m5p' @ warning/jpeg.c/JPEGWarningHandler/348.
identify.exe: Corrupt JPEG data: 1 extraneous bytes before marker 0xc4 `c:\users\thin\appdata\local\temp\tmpxs7m5p' @ warning/jpeg.c/JPEGWarningHandler/348.
identify.exe: Corrupt JPEG data: 1 extraneous bytes before marker 0xda `c:\users\thin\appdata\local\temp\tmpxs7m5p' @ warning/jpeg.c/JPEGWarningHandler/348.
Invalid Parameter - -auto-orient

Changing back sorl settings to use the default PIL backend and repeating the command for jpg to jpg conversion, the thumbnail image is generated without errors/warnings and available through the cache.

It seems that sorl is copying the source file to a temporary file before passing it to gm - and that the problem originates in this copy operation.

I've found what I believe to be the copy operation in the sources of sorl_thumbnail-11.12-py2.7.egg\sorl\thumbnail\engines\convert_engine.py lines 47-55:

class Engine(EngineBase):

    ...

    def get_image(self, source):
        """
        Returns the backend image objects from a ImageFile instance
        """
        handle, tmp = mkstemp()
        with open(tmp, 'w') as fp:
            fp.write(source.read())
        os.close(handle)
        return {'source': tmp, 'options': SortedDict(), 'size': None}

Could the problem be here - I don't see it!

Any suggestions of how to overcome this problem would be greatly appreciated! I'm using django 1.4, sorl-thumbnail 11.12 with memcached and ghostscript 9.07.

Was it helpful?

Solution

After some trial and error, I found that the problem could be solved by changing the write mode from 'w' to 'wb', so that the sources of sorl_thumbnail-11.12-py2.7.egg\sorl\thumbnail\engines\convert_engine.py lines 47-55 now read:

class Engine(EngineBase):

    ...

    def get_image(self, source):
        """
        Returns the backend image objects from a ImageFile instance
        """
        handle, tmp = mkstemp()
        with open(tmp, 'wb') as fp:
            fp.write(source.read())
        os.close(handle)
        return {'source': tmp, 'options': SortedDict(), 'size': None}

There are I believe two other locations in the convert_engine.py file, where the same change should be made. After that, the gm convert command was able to process the file.

However, since my pdf's are fairly large multipage pdf's I then ran into other problems, the most important being that the get_image method makes a full copy of the file before the thumbnail is generated. With filesizes around 50 Mb it therefore turns out to be a very slow process, and finally I've opted for bypassing sorl and calling gm directly. The thumbnail is then stored in a standard ImageField. Not so elegant, but much faster.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top