Domanda

I have implemented adding swf files to pdf using iTextsharp, and my question is, is it possible to do the reverse engineering, for example if I give pdf as input, I have to get swf files from it, if yes how I can do that?

Any idea of how to start, would be greatly appreciated.

Kind Regards,

Raghu.M

È stato utile?

Soluzione

This is a working example that takes this embedded pdf here (first one I found):

http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/fileAttachment.pdf

And extracts the embedded files, in this case a KSBASE.WQ2 file.

    public static void ExtractAttachments(String src, String dir)
    {

        PdfReader reader = new PdfReader(string.Format("{0}\\{1}", dir, src));
        PdfDictionary root = reader.Catalog;
        PdfDictionary names = root.GetAsDict(PdfName.NAMES);
        PdfDictionary embedded = names.GetAsDict(PdfName.EMBEDDEDFILES);
        PdfArray filespecs = embedded.GetAsArray(PdfName.NAMES);
        for (int i = 0; i < filespecs.Size; )
        {
            ExtractAttachment(reader, dir, filespecs.GetAsString(i++),
            filespecs.GetAsDict(i++));

        }
    }

    protected static void ExtractAttachment(PdfReader reader, string dir, PdfString name, PdfDictionary filespec)
    {
        PRStream stream;
        FileStream fos;
        String filename;
        PdfDictionary refs = filespec.GetAsDict(PdfName.EF);
        foreach(PdfName key in refs.Keys) {
            stream = (PRStream)PdfReader.GetPdfObject(refs.GetAsIndirectObject(key));
            filename = filespec.GetAsString(key).ToString();
            // here you can do an filename.Contains(".swf) check
            var fileBytes = PdfReader.GetStreamBytes(stream);
            File.WriteAllBytes(string.Format("{0}\\{1}", dir, filename), fileBytes);
            }
        }

You would call this as follows:

var dir = "C:\\temp\\PdfExtract";
ExtractAttachments("fileAttachment.pdf", dir);

You can simply add a filename.Contains(".swf) check around the file names before extracting.

Update

Ok, this is how I would figure it out if the above approach did not work.

The files must be located in a different place within the catalog, without seeing the file this is how I would approach it.

I would add a breakpoint after root is resolved then step into it to see if I could find where the swf files are.

If you look into root.Keys you will see what the Catalog contains.

root.Keys

To retreive any dictionary objects you can use the GetAsDict method passing in a PdfName which matches.

Stepping down a level futher you can see that it contains the EmbeddedFiles and so forth.

enter image description here

There are several PdfName names, there is even a Flash one.

As the structure of any document can be different it will just be a case of investigating the structure and using the correct parameter's to GetAsDict in order to read the files.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top