Pregunta

How to remove metadata on PDF using Java?

Is IText will do or any other frameworks have ability to do this? I didn't find any examples or Classes which will remove metadata using IText. If anybody done this before or any ideas?

Please share your views.

Thanks in advance.

¿Fue útil?

Solución

First you need to differentiate since there are two types of metadata in the PDF:

  1. XMP meta data
  2. DID (document information dictionary, the old way)

The first you remove like the following:

PdfReader reader = stamper.getReader();
reader.getCatalog().remove(PdfName.METADATA);
reader.removeUnusedObjects();

The 2nd you remove like SANN3 has mentioned:

HashMap<String, String> info = super.reader.getInfo();
info.put("Title", null);
info.put("Author", null);
info.put("Subject", null);
info.put("Keywords", null);
info.put("Creator", null);
info.put("Producer", null;
info.put("CreationDate", null);
info.put("ModDate", null);
info.put("Trapped", null);
stamper.setMoreInfo(info);

If you then search the PDF with a text editor you won't find the /INFO dictionary nor XMP meta data...

Otros consejos

Try this code

PdfReader readInputPDF = new PdfReader("sample.pdf");
HashMap<String, String> hMap = readInputPDF.getInfo();
PdfStamper stamper = new PdfStamper(readInputPDF, new FileOutputStream("sample1.pdf"));
hMap.put("Author", null);
stamper.setMoreInfo(hMap);
stamper.close();

Add the Metadata properties to the map which you want to remove from the PDF.

Updated for those using PdfBox 2.x

File pdf = new File("student-scorecard.pdf";
PDDocument document = PDDocument.load(pdf);
PDDocumentInformation information = document.getDocumentInformation();
if(information != null) {
  document.setDocumentInformation(new PDDocumentInformation());
  document.save("student-scorecard-anonymized.pdf");
}
document.getDocumentCatalog().setMetadata(null); // Tilman Hausherr 
document.close();

Thanks @Magnus Updated for those using PdfBox 2.x

In my case I had to change the order to get it work:

 File pdf = new File("student-scorecard.pdf";
 PDDocument document = PDDocument.load(pdf);
 PDDocumentInformation information = document.getDocumentInformation();
 if(information != null) {
     document.setDocumentInformation(new PDDocumentInformation());
     document.getDocumentCatalog().setMetadata(null); // Tilman Hausherr 
     document.save("student-scorecard-anonymized.pdf");
 }
 document.close();
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top