Domanda

I'm using pdfbox for the first time. Now I'm reading something on the website Pdf

Summarizing I have a pdf like this:

enter image description here

only that my file has many and many different component(textField,RadionButton,CheckBox). For this pdf I have to read these values : Mauro,Rossi,MyCompany. For now I wrote the following code:

PDDocument pdDoc = PDDocument.loadNonSeq( myFile, null );
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();

for(PDField pdField : pdAcroForm.getFields()){
    System.out.println(pdField.getValue())
}

Is this a correct way to read the value inside the form component? Any suggestion about this? Where can I learn other things on pdfbox?

È stato utile?

Soluzione

The code you have should work. If you are actually looking to do something with the values, you'll likely need to use some other methods. For example, you can get specific fields using pdAcroForm.getField(<fieldName>):

PDField firstNameField = pdAcroForm.getField("firstName");
PDField lastNameField = pdAcroForm.getField("lastName");

Note that PDField is just a base class. You can cast things to sub classes to get more interesting information from them. For example:

PDCheckbox fullTimeSalary = (PDCheckbox) pdAcroForm.getField("fullTimeSalary");
if(fullTimeSalary.isChecked()) {
    log.debug("The person earns a full-time salary");
} else {
    log.debug("The person does not earn a full-time salary");
}

As you suggest, you'll find more information at the apache pdfbox website.

Altri suggerimenti

The field can be a top-level field. So you need to loop until it is no longer a top-level field, then you can get the value. Code snippet below loops through all the fields and outputs the field names and values.

{
    //from your original code
    PDDocument pdDoc = PDDocument.loadNonSeq( myFile, null );
    PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
    PDAcroForm pdAcroForm = pdCatalog.getAcroForm();


    //get all fields in form
    List<PDField> fields = acroForm.getFields();
    System.out.println(fields.size() + " top-level fields were found on the form");

    //inspect field values
    for (PDField field : fields)
    {
            processField(field, "|--", field.getPartialName());
    }

    ...
}


private void processField(PDField field, String sLevel, String sParent) throws IOException
{
        String partialName = field.getPartialName();

        if (field instanceof PDNonTerminalField)
        {
                if (!sParent.equals(field.getPartialName()))
                {
                        if (partialName != null)
                        {
                                sParent = sParent + "." + partialName;
                        }
                }
                System.out.println(sLevel + sParent);

                for (PDField child : ((PDNonTerminalField)field).getChildren())
                {
                        processField(child, "|  " + sLevel, sParent);
                }
        }
        else
        {
            //field has no child. output the value
                String fieldValue = field.getValueAsString();
                StringBuilder outputString = new StringBuilder(sLevel);
                outputString.append(sParent);
                if (partialName != null)
                {
                        outputString.append(".").append(partialName);
                }
                outputString.append(" = ").append(fieldValue);
                outputString.append(",  type=").append(field.getClass().getName());
                System.out.println(outputString);
        }
}
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top