Question

Est-il possible d'extraire le texte d'une région spécifique en utilisant ICEpdf? J'ai pu extraire des pages entières, mais ce n'est pas ce que je veux faire.

(je sais bien PDFBox extrait le texte dans une zone rectangulaire spécifique d'une page. Cependant, étant donné que le rendu d'image fonctionne beaucoup mieux dans ICEpdf, je voudrais utiliser cette bibliothèque.)

Était-ce utile?

La solution

ON the Page object that represents a page you can call the method:

PageText pageText = document.getPageText(pagNumber);

Similar to the bundle example ./examples/extraction/PageTextExtraction.java

The PageText object contains all the LineText->WordText->GlyphText objects for the page. LineText, WordText and GlyphText all extend AbstractText which has a getBounds() method. The bounds of these objects are in PDF user space, the 1st geometric quadrant. Java2D is in the 4th geometric quadrant. Assuming you already have the selectionRectangle the code would be as follows:

//  the currently selected state, ignore highlighted.
currentPage.getViewText().clearSelected();

// get page transform, same for all calculations
AffineTransform pageTransform = currentPage.getPageTransform(
        Page.BOUNDARY_CROPBOX,
        documentViewModel.getViewRotation(),
        documentViewModel.getViewZoom());

Rectangle2D.Float pageSpaceSelectRectangle =
        convertRectangleToPageSpace(selectionRectangle, pageTransform);
ArrayList pageLines = pageText.getPageLines();
for (LineText pageLine : pageLines) {
    // check for containment, if so break into words.
    if (pageLine.getBounds().intersects(pageSpaceSelectRectangle )) {
        // you have some selected text. 
    }
}



    /**
     * Converts the rectangle to the space specified by the page tranform. This
     * is a utility method for converting a selection rectangle to page space
     * so that an intersection can be calculated to determine a selected state.
     *
     * @param mouseRect     rectangle to convert space of
     * @param pageTransform page transform
     * @return converted rectangle.
     */
    private Rectangle2D convertRectangleToPageSpace(Rectangle mouseRect,
                                                    AffineTransform pageTransform) {
        GeneralPath shapePath;
        try {
            AffineTransform tranform = pageTransform.createInverse();
            shapePath = new GeneralPath(mouseRect);
            shapePath.transform(tranform);
            return shapePath.getBounds2D();
        } catch (NoninvertibleTransformException e) {
            logger.log(Level.SEVERE,
                    "Error converting mouse point to page space.", e);
        }
        return null;
    }

Autres conseils

Have you posted on the icepdf forums? They are usually very good at answering questions there?

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top