Domanda

I am working on an application that requires images submitted to it to be lossless. Currently I am opening the image with PIL and checking if the "format" attribute is a lossless format. This requires me to manually keep a list of formats, and I have no idea if, for instance, a jpeg that was submitted just happens to have the lossless variant applied.

import PIL
import PIL.Image


def validate_image(path):
    img = PIL.Image.open(path)
    if not img.format.lower() in ['bmp', 'gif', 'png', ...]:
        raise Exception("File %s has invalid image format %s" % (path, img.format))

Is there a better way to check if the image file is lossless?

È stato utile?

Soluzione

I think I now understand things: You want to open the images via PIL. You want to reject lossy images because you're doing scientific processing of some kind that needs all that lost data because information that's unimportant for human visual processing is important for your algorithms.

PIL does not have any kind of interface at the top level to distinguish different types of compression. You could reach inside the image decoders and assume that anything that uses the "raw" decoder is lossless, but even if you wanted to do that, that's too limited—it'll rule out GIF, LZW-compressed TIFF, etc. along with JPEG, JPEG-compressed TIFF, etc.

Keep in mind that the real problem is here is messaging and documentation—managing user expectations. The check for lossy images is really just a heuristic, a way to catch the more obvious mistakes and remind the user what the requirements are. So, you don't need something perfect, but having something pretty good may be helpful anyway.

So, there are only a few options, none of them very good:

  1. Hack up PIL's decoder source to retain the encoding information and pass it up to the top level. This is, obviously, going to take some non-trivial work, in 30 different importers, possibly involving C as well as Python, and it will result in a patch that you have to maintain against a (slowly-)evolving codebase—although of course you can always submit it upstream and hope that it makes it into future versions of PIL.

  2. Dig into the decoders themselves to get the information at runtime. The only semi-standard thing you can really find is whether they use the raw decoder or the bit decoder, which isn't useful at all (many lossless formats will need the bit decoder), so you'll probably end up reading all 30 importers and writing a dozen or so pieces of code to extract information from them.

  3. Use another library along with (or in place of) PIL. For example, while ImageMagick is definitely not significantly easier than PIL, it does have an API to tell you what type of compression an image file uses. Basically, if it's UndefinedCompression or JPEGCompression it's lossy, anything else, it's lossless. The major downside (besides needing to install two image libraries) is that there will be files that PIL can open but IM can't, and vice-versa, and multi-image files that PIL and IM handle differently, and so on.

  4. Do what you're already doing. Read through the 30 importers to make a list of which are lossy and which are lossless. To handle cases like JPEG and TIFF that are sometimes lossless, you may want to write code that doesn't flat-out reject them, but instead gives a warning saying "These files may be lossy. Are you sure you want to import them?" (Or, alternatively, just offer an "I know what I'm doing" override for all lossy formats, and then just consider JPEG and TIFF lossy.)

For many use cases, I'd be very wary of going with #4, but for yours, it actually seems pretty reasonable. You're not trying to block lossy images because your code will crash, or for security reasons, or anything like that; you're just trying to warn people that they're going to waste a lot of time getting useless information if they submit a JPEG, right?

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top