Domanda

i have a pdf file(attached). My objective is to convert a pdf to an image using pdfbox AS IT IS,(same as using snipping tool in windows). The pdf has all kinds of shapes and text .

i am using the following code:

PDDocument doc = PDDocument.load("Hello World.pdf");
PDPage firstPage = (PDPage) doc.getDocumentCatalog().getAllPages().get(67);
BufferedImage bufferedImage = firstPage.convertToImage(imageType,screenResolution);
ImageIO.write(bufferedImage, "png",new File("out.png"));

This is the PDF i want to convert

when i use the code, the image file gives totally wrong outputs(out.png attached) This is the image file converted from pdfbox

how do i make pdfbox take something like a direct snapshot image?

also, i noticed that the image quality of the png is not so good, is there any way to increase the resolution of the generated image?

EDIT: here is the pdf(see page number 68) https://drive.google.com/file/d/0B0ZiP71EQHz2NVZUcElvbFNreEU/edit?usp=sharing

EDIT 2: it seems that all the text isvanishing. i also tried using the PDFImageWriter class

test.writeImage(doc, "png", null, 68, 69, "final.png",TYPE_USHORT_GRAY,200 );

same result

È stato utile?

Soluzione 3

it turns out that jpedal(lgpl) does the converting perfectly(just like a snapshot).

here is what I have used :

PdfDecoder decode_pdf = new PdfDecoder(true);


FontMappings.setFontReplacements();

    decode_pdf.openPdfFile("Hello World.pdf"); 


 decode_pdf.setExtractionMode(0,800,3);

 try {

     for(int i=0;i<40;i++)
     {  
         BufferedImage img=decode_pdf.getPageAsImage(2+i);

    ImageIO.write(img, "png",new File(String.valueOf(i)+"out.png"));
     }
} catch (IOException ex) {
    Logger.getLogger(NewJFrame.class.getName()).log(Level.SEVERE, null, ex);
}

    decode_pdf.closePdfFile();

} catch (PdfException e) {
    e.printStackTrace();
}

it works fine.

Altri suggerimenti

Using PDFRenderer it is possible to convert PDF page into image formats.

Convert PDF page into image in java Using PDF Renderer. Jars Required PDFRenderer-0.9.0

package com.pdfrenderer.examples;

import java.awt.Graphics2D;
import java.awt.Image;
import java.awt.Rectangle;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;

import javax.imageio.ImageIO;

import com.sun.pdfview.PDFFile;
import com.sun.pdfview.PDFPage;

public class PdfToImage {
    public static void main(String[] args) {
        try {
            String sourceDir = "C:/Documents/Chemistry.pdf";// PDF file must be placed in DataGet folder
            String destinationDir = "C:/Documents/Converted/";//Converted PDF page saved in this folder

        File sourceFile = new File(sourceDir);
        File destinationFile = new File(destinationDir);

        String fileName = sourceFile.getName().replace(".pdf", "_cover");

        if (sourceFile.exists()) {
            if (!destinationFile.exists()) {
                destinationFile.mkdir();
                System.out.println("Folder created in: "+ destinationFile.getCanonicalPath());
            }

            RandomAccessFile raf = new RandomAccessFile(sourceFile, "r");
            FileChannel channel = raf.getChannel();
            ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
            PDFFile pdf = new PDFFile(buf);
            int pageNumber = 62;// which PDF page to be convert
            PDFPage page = pdf.getPage(pageNumber);

            System.out.println("Total pages:"+ pdf.getNumPages());

            // create the image
            Rectangle rect = new Rectangle(0, 0, (int) page.getBBox().getWidth(), (int) page.getBBox().getHeight());
            BufferedImage bufferedImage = new BufferedImage(rect.width, rect.height, BufferedImage.TYPE_INT_RGB);

            // width & height, // clip rect, // null for the ImageObserver, // fill background with white, // block until drawing is done
            Image image = page.getImage(rect.width, rect.height, rect, null, true, true );
            Graphics2D bufImageGraphics = bufferedImage.createGraphics();
            bufImageGraphics.drawImage(image, 0, 0, null);

            File imageFile = new File( destinationDir + fileName +"_"+ pageNumber +".png" );// change file format here. Ex: .png, .jpg, .jpeg, .gif, .bmp

            ImageIO.write(bufferedImage, "png", imageFile);

            System.out.println(imageFile.getName() +" File created in: "+ destinationFile.getCanonicalPath());
        } else {
            System.err.println(sourceFile.getName() +" File not exists");
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}
}

ConvertedImage:

Chemistry_cover_62

I get the same result as the OP using PDFBox version 1.8.4. In version 2.0.0-SNAPSHOT, though, it looks better:

enter image description here

Here only some arrows are thinner and some arrow parts are mis-drawn as boxes.

Thus,

how do i make pdfbox take something like a direct snapshot image?

The current release versions (up to 1.8.4) seem to have greater deficits when rendering PDFs as images. You may switch to a current development version (e.g. the current trunk, 2.0.0-SNAPSHOT) or wait until the improvements are released.

Furthermore, some minor deficits are even in 2.0.0-SNAPSHOT. You might want to present your sample document to the PDFBox people (i.e. create an according issue in their JIRA) so that they improve PDFBox even further to suit your needs.

also, i noticed that the image quality of the png is not so good, is there any way to increase the resolution of the generated image?

There are convertToImage overloads with resolution parameters. Your current code actually sets the resolution to screenResolution. Increase this resolution value.

PS: The code to render a PDF page to image has been refactored in 2.0.0-SNAPSHOT. Instead of

BufferedImage image =  page.convertToImage();

you now do

BufferedImage image =  RenderUtil.convertToImage(page);

I assume this has been done to remove direct AWT references from the core classes because AWT is not available on e.g. Android.


PS: The SNAPSHOT I used last year in this answer merely was a snapshot subject to changes. The 2.0.0 release is still under development, many things have changed. Especially there is no RenderUtil class anymore. Instead one currently has to use the PDFRenderer in the org.apache.pdfbox.rendering package...

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top