Question

so I've been told I might not have access to the registry or programs with which usually load their IFilters onto the system, so I have to include the IFilter dlls in the application and load them directly from there. I'm currently using CodeProject's C# IFilter classes, but their are still a few things that are over my head when it comes to the filterPersistClass, persistentHandlerClass and COM and as such I am a bit lost on how I could get this to work.

I've done all the mundane stuff like, get the dlls, setup a resource file with "Extension, DLL Path" and that, but just can't seem to get a grasp on how to now load the IFilter DLL. It's maybe that I should just start from scratch, but thought I would ask for some help first.

EDIT (Partial Solution)

Well I figured out how to load query.dll using the code below in the FilterReader constructor in FilterReader.cs, though I'm having problems now loading the PDFFilter.dll file and am getting the following error:

Unable to find an entry point named 'LoadIFilter' in DLL 'C:\Program Files\Adobe\Adobe PDF iFilter 9 for 64-bit platforms\bin\PDFFilter.dll'

The problem I think I am now stuck at is that PDFFilter.dll uses STA and C# applications are MTA.

[DllImport("query.dll", SetLastError = true, CharSet = CharSet.Unicode)]
static extern int LoadIFilter(string pwcsPath, [MarshalAs(UnmanagedType.IUnknown)] ref object pUnkOuter, ref IFilter ppIUnk);

// --------------------------- constructor ----------------------------------

var isFilter = false;
object iUnknown = null;

LoadIFilter(fileName, ref iUnknown, ref _filter);

var persistFile = (_filter as IPersistFile);
if (persistFile != null)
{
    persistFile.Load(fileName, 0);
    IFILTER_FLAGS flags;
    IFILTER_INIT iflags =
        IFILTER_INIT.CANON_HYPHENS |
        IFILTER_INIT.CANON_PARAGRAPHS |
        IFILTER_INIT.CANON_SPACES |
        IFILTER_INIT.APPLY_INDEX_ATTRIBUTES |
        IFILTER_INIT.HARD_LINE_BREAKS |
        IFILTER_INIT.FILTER_OWNED_VALUE_OK;

    if (_filter.Init(iflags, 0, IntPtr.Zero, out flags) == IFilterReturnCode.S_OK)
        isFilter = true;
}

if (_filter != null && isFilter) return;
if (_filter != null) Marshal.ReleaseComObject(_filter);
Was it helpful?

Solution

There is nothing magical about IFilter objects. They are housed in standard COM dlls. In the end, all you need the clsid of the class which knows how to process pdf files.

The LoadIFilter function in query.dll is just a convenient helper function. Everything it does you can do yourself.

There is a standard way, in the registry, in which a file extension (e.g. .pdf) is resolved to a clsid (e.g. {E8978DA6-047F-4E3D-9C78-CDBE46041603})

Note: You could also just skip to the end, and know that the clsid of Adobe's IFilter implementation is {E8978DA6-047F-4E3D-9C78-CDBE46041603}. But that's not guaranteed, so you need to crawl the registry.

The algorithm to resolve an .ext to the clsid of an object that implements IFilter is:

GetIFilterClassIDForFileExtension(String extension)   
    arguments:   
        extension (String) e.g. ".pdf"   
    returns:    
        clsid (Guid) e.g. 

    //Get the Persistent Handler for this extension
    //e.g. 
    //   HKLM\Software\Classes\.pdf\PersistentHandler\(Default)
    //returns
    //   "{F6594A6D-D57F-4EFD-B2C3-DCD9779E382E}"
    persistentHandlerGuid = HKLM\Software\Classes\.pdf\PersistentHandler\(Default)

    //Get the clsid associated with this persistent handler
    //e.g. 
    //   HKLM\Software\Classes\CLSID\{F6594A6D-D57F-4EFD-B2C3-DCD9779E382E}\PersistentAddinsRegistered\{89BCB740-6119-101A-BCB7-00DD010655AF}
    //where the final guid is the interface identifier (IID) of IFilter
    clsid = HKLM\persistentHandlerGuid\PersistentAddinsRegistered\{89BCB740-6119-101A-BCB7-00DD010655AF}

    //e.g. returns "{E8978DA6-047F-4E3D-9C78-CDBE46041603}", the clsid of Adobe's PDF IFilter
    return clsid

Once you have the clsid of the appropriate object, you create it with:

Guid clsid = GetIFilterClassForFileExtension(".pdf")
IFilter filter = CreateComObject(clsid);

You now have the entire guts of the LoadIFilter function from query.dll:

IFilter LoadIFilter(String filename)
{
   String extension = ExtractFileExt(filename); //e.g. "foo.pdf" --> ".pdf"
   Guid clsid = GetIFilterClassForFileExtension(extension);
   return CreateComObject(clsid) as IFilter;
}

Now, all that still requires the registry, because you still have to be able to resolve an extension into a clsid. If you already know the classid, then you don't need the registry:

IFilter adobeIFilterForPdfs = CreateComObject("{E8978DA6-047F-4E3D-9C78-CDBE46041603}")

And you're good to go.

The important point is that the function you're trying to call, LoadIFilter is not inside Adobe's dll (or any other IFilter dll provided by any other company, to crawl any other file types). The LoadIFilter function is exported by query.dll, and is simply a helper function for the above steps i described.

All IFilter dlls are COM dlls. The documented way to load a COM dll is through the CoCreateInstance function:

IUnknown CreateComObject(Guid ClassID)
{
   IUnknown unk;

   HRESULT hr = CoCreateInstance(ClassID, null, CLSCTX_INPROC_SERVER | CLSCTX_LOCAL_SERVER, IUnknown, ref unk);
   if (Failed(hr))
      throw new Exception("Could not create instance: "+hr);
   return unk;
}

I'll leave it to you to find the correct way to create a COM object from C# managed code. I've forgotten.

Note: Any code released into public domain. No attribution required.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top