Is there a way to infer what image format a file is, without reading the entire file?
Question
Is there a good way to see what format an image is, without having to read the entire file into memory?
Obviously this would vary from format to format (I'm particularly interested in TIFF files) but what sort of procedure would be useful to determine what kind of image format a file is without having to read through the entire file?
BONUS: What if the image is a Base64-encoded string? Any reliable way to infer it before decoding it?
Solution
Most image file formats have unique bytes at the start. The unix file
command looks at the start of the file to see what type of data it contains. See the Wikipedia article on Magic numbers in files and magicdb.org.
OTHER TIPS
Sure there is. Like the others have mentioned, most images start with some sort of 'Magic', which will always translate to some sort of Base64 data. The following are a couple examples:
A Bitmap will start with Qk3
A Jpeg will start with /9j/
A GIF will start with R0l
(That's a zero as the second char).
And so on. It's not hard to take the different image types and figure out what they encode to. Just be careful, as some have more than one piece of magic, so you need to account for them in your B64 'translation code'.
Either file
on the *nix command-line or reading the initial bytes of the file. Most files come with a unique header in the first few bytes. For example, TIFF's header looks something like this:
0x00000000: 4949 2a00 0800 0000For more information on the TIFF file format specifically if you'd like to know what those bytes stand for, go here.
A comprehensive site of file formats is available at:
TIFFs will begin with either II or MM (Intel byte ordering or Motorolla).
The TIFF 6 specification can be downloaded here and isn't too hard to follow