Question

Since there is virtually no documentation or code snippets on programming inside OpenText Capture Center. I need some input from someone with experience.

Here is the crux of what I need... In the Scripting Manager, I need to be able to access all of the Phrase objects that the OCR identified in the document, regardless of the Fields matched or identified during extraction.

As long as I have access to the OCR phrases, I can do two things that will greatly increase our matching percentage on any field.

  1. Perform sanitations and transformations of the invoice phrases as a type of pre-processing before matching occurs (I.E. turn Corporation into CORP, remove apostrophes, etc..)
  2. Write a custom matching function that is more understanding of our data than the native Generic SnapMatch.

Thanks!

Was it helpful?

Solution

Ok, ultimately there is no way to do this via the Scripting Manager entry points. The reason for this is that all the image data is parsed and extracted prior to entry into the scripting manager. By the time you get to the extraction phase of the manager, you have an XML Runtime document which represents the meta structure of the output document with data that the extraction "thought might be useful" before entry. All other possible "phrases" and other data types extracted that did not fit a field directly or an alternative is "discarded". Meaning that the Vendor Name or something similar which DoKuStar didn't find interesting, is still not searchable with any code mechanism.

The problem I needed to solve was very specific to my particular domain, and was caused indirectly by policy of the Oracle group. The names of vendors was stripped of special characters and concatenated. Basically, they just did not match what was on the invoice, and therefore snapmatch was virtually useless.

I created an intermediate solution whereby the local SnapMatch database could be updated by users directly, "Rename Vendor" so to speak. And therefore our local SnapMatch database will match what was on the invoices as we make corrections, even if the Oracle database doesn't. All in all, not a specific solution to the coding side, but it turned out to be an effective solution to the domain issue.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top