문제

Is there any way to extract the text of a specific region using ICEpdf? I was able to extract whole pages, but that's not what I want to do.

(I know PDFBox nicely extracts the text in a specific rectangular area of a page. However, since the image rendering works a lot better in ICEpdf, I'd like to use that library.)

도움이 되었습니까?

해결책

ON the Page object that represents a page you can call the method:

PageText pageText = document.getPageText(pagNumber);

Similar to the bundle example ./examples/extraction/PageTextExtraction.java

The PageText object contains all the LineText->WordText->GlyphText objects for the page. LineText, WordText and GlyphText all extend AbstractText which has a getBounds() method. The bounds of these objects are in PDF user space, the 1st geometric quadrant. Java2D is in the 4th geometric quadrant. Assuming you already have the selectionRectangle the code would be as follows:

//  the currently selected state, ignore highlighted.
currentPage.getViewText().clearSelected();

// get page transform, same for all calculations
AffineTransform pageTransform = currentPage.getPageTransform(
        Page.BOUNDARY_CROPBOX,
        documentViewModel.getViewRotation(),
        documentViewModel.getViewZoom());

Rectangle2D.Float pageSpaceSelectRectangle =
        convertRectangleToPageSpace(selectionRectangle, pageTransform);
ArrayList pageLines = pageText.getPageLines();
for (LineText pageLine : pageLines) {
    // check for containment, if so break into words.
    if (pageLine.getBounds().intersects(pageSpaceSelectRectangle )) {
        // you have some selected text. 
    }
}



    /**
     * Converts the rectangle to the space specified by the page tranform. This
     * is a utility method for converting a selection rectangle to page space
     * so that an intersection can be calculated to determine a selected state.
     *
     * @param mouseRect     rectangle to convert space of
     * @param pageTransform page transform
     * @return converted rectangle.
     */
    private Rectangle2D convertRectangleToPageSpace(Rectangle mouseRect,
                                                    AffineTransform pageTransform) {
        GeneralPath shapePath;
        try {
            AffineTransform tranform = pageTransform.createInverse();
            shapePath = new GeneralPath(mouseRect);
            shapePath.transform(tranform);
            return shapePath.getBounds2D();
        } catch (NoninvertibleTransformException e) {
            logger.log(Level.SEVERE,
                    "Error converting mouse point to page space.", e);
        }
        return null;
    }

다른 팁

Have you posted on the icepdf forums? They are usually very good at answering questions there?

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top