Question

I try to edit images in PDF file using PDFBox library. How I have example working only for jpeg images. ImageIO.read() fails to decode images with 'png' suffix. Here is code example. So my question: how to do the same for all types of images in PDF documents? Can I still use ImageIO for it or need another approach?

public static void main(String[] args) throws Exception {

    PDDocument doc = PDDocument.load("docs/input1.pdf");

    // Get all images from first page 
    Map<String, PDXObjectImage> pageImages = ((PDPage) doc.getDocumentCatalog().getAllPages().get(0)).getResources().getImages();
    if (pageImages != null) 
    {
        // iterate by images
        Iterator<String> imageIter = pageImages.keySet().iterator();
        while (imageIter.hasNext()) 
        {
            String key =  imageIter.next();

            PDXObjectImage image = pageImages.get(key); // get page image object
            String suffix = image.getSuffix();  // get image suffix
            String imageName = key+'.'+suffix;  // compose image name

            System.out.print("process "+imageName+"... ");

            COSStream s = image.getCOSStream(); // get COSStream to manipulate
            BufferedImage img = ImageIO.read(s.getFilteredStream()); // get BufferedImage to edit

            if(img == null)
            {
                System.out.println("Can't decode");
            }
            else
            {
                paint(img.createGraphics()); // draw on it
                ImageIO.write(img, suffix, new File("out/"+imageName)); // write file to check result...

                // encode image back to COSStream
                OutputStream out = s.createFilteredStream();
                ImageIO.write(img, suffix, out);
                out.close();
                System.out.println("done");
            }
        }
    }
    doc.save("out/output1.pdf"); // save document
}   

/**
 * Draw red rectangular to test
 * @param g graphics
 */
public static void paint(Graphics2D g) {
    int xpoints[] = {25, 245, 245, 25};
    int ypoints[] = {25, 25, 545, 545};
    g.setColor(Color.RED);
    g.fillPolygon(xpoints, ypoints, 4);
}
Was it helpful?

Solution

It's better to work not with stream of PDXObjectImage but create new instance of PDXObjectImage and replace it in resources collection. It's more generic and universal way. Use getRGBImage() to convert PDXObjectImage to BufferedImage and constructor (PDPixelMap, PDJpeg etc) to convert edited result back to PDXObjectImage. Note you still have problems with JBIG2 and Jpeg2000 images due to bugs. Here is code example I use to find and convert all images in document:

// Recursive resource processor
// Here can be images inside in PDXObjectForm objects
protected static void processResources(PDResources resources, PDDocument doc, String filename) throws IllegalArgumentException, SecurityException, IOException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, JBIG2Exception, ColorSpaceException, ICCProfileException
{
    if(resources == null) return;
    Map<String, PDXObject> xObjects = resources.getXObjects();
    if (xObjects == null) return;

    // iterate by images
    Iterator<String> imageIter = xObjects.keySet().iterator();
    while (imageIter.hasNext()) 
    {
        String key =  imageIter.next();

        PDXObject o = xObjects.get(key);

        if(o instanceof PDXObjectImage)
            xObjects.put(key, processImage((PDXObjectImage) o /*, some additional parms... */));

        if(o instanceof PDXObjectForm)
            processResources(((PDXObjectForm) o).getResources(), doc, filename);
    }

    resources.setXObjects(xObjects);
}

Note resources.setXObjects() call at the end - without it changes you made in collection obtained by resources.getXObjects() will not be written back to document.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top