I thought a JPEG is a JPEG is a JPEG.
Actually, most files referred to as "a JPEG file" are either JFIF or Exif. :-)
Exif uses the structure of JFIF, so you can parse them just the same. But because JFIF specifies that the first APP segment must be APP0/JFJF, and Exif says that for Exif the first APP segment must be APP1/Exif, they are not really compatible. Some JFIFs contain Exif APP segments in a later segment, to use it for metadata. Some "JPEG"s contains neither Exif or JFIF APP segment, but still contain valid JPEG code streams. Most software glosses over this fact though.
Is there a good reason for filtering based on particular values of the APP segment?
Depends. For example, if you want to filter out Exif only, or ISO JPEG only, then yes. If you want to read as many "JPEG"s as possible, then you obviously don't want this.
Some software (ie. default Java JPEGImageReaderSpi
used by ImageIO
, as you mention Java) uses just the SOI marker (0xFF, 0xD8) to identify JPEG. Making sure the next byte is 0xFF is of course an extra precaution, to filter out false positives.
How exactly does the APP segment effect the JPEG image?
Some APP segments effect how the compressed JPEG data is to be interpreted. Most JPEG reading software needs to be aware of at least APP0/JFIF, APP1/Exif, APP2/ICC_PROFILE, APP14/Adobe to properly interpret and convert color from the compressed data. Ignoring these, will most likely produce images with strange-looking or inaccurate colors.
Other segments, like the APP0/JFXX (thumbnail extension), APP13/Photoshop 3.0 and APP1/XMP tags are used mainly for metadata, and can probably be ignored.
Also note that the APPn segments start with a null-terminated ASCII string after the APPn marker, to fully identify the APP segment type. It's not enough to just look at the marker.
PS: To read JPEGs in Java, you might want to have a look at my TwelveMonkeys ImageIO library, to expand the number of "JPEG" varieties ImageIO can read.