Domanda

I have this program that makes a search, for example a sentence, in all pdf files of a folder. It's working perfect...

But I would like to add a feature to open in the exact page of that sentence. And I look through the documentation of pdfbox and I could not find anything that was specific for this.

I don't know if I let something pass by, but if somebody could enlighten me in this I would be very grateful

Thank you

È stato utile?

Soluzione

I read your question earlier this week. At the time, I didn't have an answer for you. Then I stumbled on the methods setStartPage() and setEndPage() on the PDFBox documentation for the PDFTextStripper class and it made me think of your question and this answer. It's been about 4 months since you asked the question, but maybe this will help someone. I know I learned a thing or two while writing it.

When you search a PDF file, you can search a range of pages. The functions setStartPage() and setEndPage() set the range of pages you are searching. If we set the start and end page to the same page number, then we will know which page the search term was found on.

In the code below, I am using a windows forms application but you can adapt my code to fit your application.

using System;
using System.Windows.Forms;
using org.apache.pdfbox.pdmodel;
using org.apache.pdfbox.util;
//The Diagnostics namespace is needed to specify PDF open parameters. More on them later.
using System.Diagnostics;
//specify the string you are searching for
string searchTerm = "golden";
//I am using a static file path
string pdfFilePath = @"F:\myFile.pdf";
//load the document
PDDocument document = PDDocument.load(pdfFilePath);
//get the number of pages
int numberOfPages = document.getNumberOfPages();
//create an instance of text stripper to get text from pdf document
PDFTextStripper stripper = new PDFTextStripper();
//loop through all the pages. We will search page by page
for (int pageNumber = 1; pageNumber <= numberOfPages; pageNumber++)
{
    //set the start page
    stripper.setStartPage(pageNumber);
    //set the end page
    stripper.setEndPage(pageNumber);
    //get the text from the page range we set above.
    //in this case we are searching one page.
    //I used the ToLower method to make all the text lowercase
    string pdfText = stripper.getText(document).ToLower();
    //just for fun, display the text on each page in a messagebox. My pdf file only has two pages. But this might be annoying to you if you have more.
    MessageBox.Show(pdfText);
    //search the pdfText for the search term
    if (pdfText.Contains(searchTerm))
    {
        //just for fun, display the page number on which we found the search term
        MessageBox.Show("Found the search term on page " + pageNumber);
        //create a process. We will be opening the pdf document to a specific page number
        Process myProcess = new Process();
        //I specified Adobe Acrobat as the program to open
        myProcess.StartInfo.FileName = "Acrobat.exe";
        //see link below for info on PDF document open parameters
        myProcess.StartInfo.Arguments = "/A \"page=" + pageNumber + "=OpenActions\"" + pdfFilePath;
        //Start the process
        myProcess.Start();
        //break out of the loop. we found our search term and we opened the PDF file
        break;
    }
}
//close the document we opened.
document.close();

Check out this Adobe pdf document on setting opening parameters of the PDF file: http://partners.adobe.com/public/developer/en/acrobat/PDFOpenParameters.pdf

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top