Question

I am using below code to convert PDF to PNG image.

        Document document = new Document();
        try {
            document.setFile(myProjectPath);
            System.out.println("Parsed successfully...");
        } catch (PDFException ex) {
            System.out.println("Error parsing PDF document " + ex);
        } catch (PDFSecurityException ex) {
            System.out.println("Error encryption not supported " + ex);
        } catch (FileNotFoundException ex) {
            System.out.println("Error file not found " + ex);
        } catch (IOException ex) {
            System.out.println("Error handling PDF document " + ex);
        }

        // save page caputres to file.
        float scale = 1.0f;
        float rotation = 0f;

        // Paint each pages content to an image and write the image to file
        InputStream fis2 = null;
        File file = null;
        for (int i = 0; i < 1; i++) {
            BufferedImage image = (BufferedImage) document.getPageImage(i,
                    GraphicsRenderingHints.SCREEN,
                    Page.BOUNDARY_CROPBOX, rotation, scale);
            RenderedImage rendImage = image;
            // capture the page image to file
            try {
                System.out.println("\t capturing page " + i);
                file = new File(myProjectActualPath + "myImage.png");
                ImageIO.write(rendImage, "png", file);
                fis2 = new BufferedInputStream(new FileInputStream(myProjectActualPath + "myImage.png"));
            } catch (IOException ioe) {
                System.out.println("IOException :: " + ioe);
            } catch (Exception e) {
                System.out.println("Exception :: " + e);
            }
            image.flush();
        }

myProjectPath is the path of the pdf file.

The problem is that I have pdf image of size 305 KB. When I use above code to convert image, the image size is 5.5 MB which is unexpected. Any reason why this is happening? Is there way to compress this? If I get solution to compress the size (by making down the pixel size), it is also OK.

Note : For other pdf files, images are coming to 305 KB. This is happening with one PDF file and not sure why this is happening.

Edit 1

I am using jar files as

icepdf-core.jar
icepdf-viewer.jar

The import that I have are

import org.icepdf.core.exceptions.PDFException;
import org.icepdf.core.exceptions.PDFSecurityException;
import org.icepdf.core.pobjects.Document;
import org.icepdf.core.pobjects.Page;
import org.icepdf.core.util.GraphicsRenderingHints;
Was it helpful?

Solution

You could extract the images from the pdf (example using PDFBox):

    List<PDPage> pages = document.getDocumentCatalog().getAllPages();
    for(PDPage page : pages) {
        Map<String, PDXObjectImage> images = page.getResources().getImages();

        for(PDXObjectImage image : images.values()){
            //TODO: write image to disk
        }   
    }

OR/AND you may want to save them as jpg to disk, as jpg overs compression as opposed to png.

You could even identify the format of the orignal image and use that when writing to disk by calling:

image.getSuffix();

OTHER TIPS

You should be able to change the size of the file by changing scale. PDFs are often much smaller then rendered images. They can represent text and vector graphics which the rendered image will use a lot of bytes to represent. I'm actually somewhat surprised that any of your pngs are about the same size as the pdfs (unless the pdfs are just pictures).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top