Question

I am trying to use the AbcPdf .net component (version 7) to process some PDFs and generate metadata. I was wondering if there is anyway to list all the tags in a pdf document? As an example of a tagged pdf, I am using this file here

Are there any other components or tools available for listing or extracting pdf tags?

Thanks in advance for you help

Was it helpful?

Solution

Use iTextSharp. It's free and you only need the "itextsharp.dll".

http://sourceforge.net/projects/itextsharp/

Here is a simple function for reading the text out of a PDF.

Public Shared Function GetTextFromPDF(PdfFileName As String) As String
    Dim oReader As New iTextSharp.text.pdf.PdfReader(PdfFileName)

    Dim sOut = ""

    For i = 1 To oReader.NumberOfPages
        Dim its As New iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy

        sOut &= iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(oReader, i, its)
    Next

    Return sOut
End Function

ITextSharp also has methods for dealing with tags.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top