Question

The system I'm working on has a feature to extract metadata from JPEG files using the com.drew.metadata package. http://www.drewnoakes.com/code/exif/ However that is limited to JPEG files, and now a customer has asked about extracting IPTC from TIF, and possibly other image formats.

Does anyone know about similar APIs to Drew Noakes one, that can extract IPTC from TIF?

Ideally this would be a pure Java approach like the com.drew.metadata one.

Was it helpful?

Solution

This is an old question. Nowadays my metadata-extractor library supports TIFF files, as well as JPEG, WebP, PSD, PNG, GIF, BMP, ICO, PCX and many camera raw formats.

The project recently moved to GitHub:

https://github.com/drewnoakes/metadata-extractor

And is available via Maven:

http://search.maven.org/#search%7Cga%7C1%7Cdrewnoakes

OTHER TIPS

I spent some time recently coding the metadata manipulation part of icafe Java image library and make it able to insert and extract metadata types like EXIF, IPTC, PHOTOSHOP, ICC_Profile, thumbnail etc. Some functions are better than others, but they all relatively work fine. There is a common interface for all the metadata reading shown below:

import java.io.IOException;
import java.util.List;
import java.util.Map;
import java.util.Iterator;

import com.icafe4j.image.meta.Metadata;
import com.icafe4j.image.meta.MetadataEntry;
import com.icafe4j.image.meta.MetadataType;
import com.icafe4j.image.meta.iptc.IPTC;

public class ExtractIPTC {

    public static void main(String[] args) throws IOException {
        Map<MetadataType, Metadata> metadataMap = Metadata.readMetadata(args[0]);
        IPTC iptc = (IPTC)metadataMap.get(MetadataType.IPTC);

        if(iptc != null) {
            Iterator<MetadataEntry> iterator = iptc.iterator();

            while(iterator.hasNext()) {
                MetadataEntry item = iterator.next();
                printMetadata(item, "", "     ");
            }
        }   
    }
    private void printMetadata(MetadataEntry entry, String indent, String increment) {
        logger.info(indent + entry.getKey() (StringUtils.isNullOrEmpty(entry.getValue())? "" : ": " + entry.getValue()));
        if(entry.isMetadataEntryGroup()) {
             indent += increment;
             Collection<MetadataEntry> entries = entry.getMetadataEntries();
             for(MetadataEntry e : entries) {
                printMetadata(e, indent, increment);
             }          
        }
    }   
}

If we pass the image "iptc.tif" from the "images" directory of the project as the argument, we will get the following information:

Record number 2: Application Record
Dataset name: Keywords
Dataset tag: 25[0x0019]
Dataset size: 6
Dataset value: Bayern
Record number 2: Application Record
Dataset name: Keywords
Dataset tag: 25[0x0019]
Dataset size: 11
Dataset value: Deckelstein
Record number 2: Application Record
Dataset name: Keywords
Dataset tag: 25[0x0019]
Dataset size: 7
Dataset value: Germany
Record number 2: Application Record
Dataset name: Keywords
Dataset tag: 25[0x0019]
Dataset size: 10
Dataset value: Nittendorf

The above code works for JPEG and TIFF alike. It automatically detects the image type and delegate to corresponding code to do the work.

NOTE: there could be more than one places in a TIFF file which contains IPTC data. One is RichTiffIPTC tag, the other is buried inside a Photoshop tag. Currently, icafe only keeps one IPTC data. If both Photoshop tag with IPTC data and a RichTiffIPTC tag exist, it will keep the RichTiffIPTC data. Otherwise, whichever tag exists, it will keep IPTC data from that tag. There is no problem keeping data from both places. Current implementation using a map mapping a metadata type key to a unique metadata. So it only keeps one unique metadata instance.

Update: icafe can now combine IPTC data from both RichTiffIPTC and Photoshop IRB and remove duplicates.

Update2: The base class of all the metadata types in ICAFE - Metadata now implements Iterable interface so the user can now iterate through a collection of MetadataEntry. MetadataEntry itself is created using composite pattern so a MetadataEntry can contain a collection of other MetadataEntry. Each MetadataEntry contains a key and a value pair. This design allow for a tree structure traversal of the metadata entries.

There's a good example here of using the imageio lib to access IPTC here

http://www.barregren.se/blog/how-read-exif-and-iptc-java-image-i-o-api

Unfortunately, you'll still have to handle some of the work yourself.

If you cannot find a pure Java implementation, you could consider using the Java bindings to ImageMagick (JMagick). That would allow for a plethora of different possible output formats.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top