Question

I have a number of scanned content items which are being scanned by scanner & converted into pdf/image and finally got stored in alfresco repository.

I can search these scanned items using metadata properties but can anybody help me on how i can search them thru content stored into scanned documents. E.g. I have scanned a form with filled in user details & i want to search into alfresco with that particular user's name.

How is it possible? Is there any way to make it as closer as possible to scanner end?

Was it helpful?

Solution 2

I can integrate & scan the content using kofax & this integration can automatically capture all details including text content of scanned content which will be filled in custom content model automatically which has mapping to all these fields and this model is attached to scanned content. Once done, it comes under purview of alfresco indexing after which user can search for same.

Also I assume kofax provides many components such as Scan, Virtual ReScan (VRS), Recognition (OCR / OMR / ICR), Validation, Verification, Quality Control, PDF Generator, etc. which are available OOTB but we need to configure these for use in our implementation. E.g. by configuring quality module, we can see error generated while scanning the content. Further as I am looking for alfresco+Kofax integration so I assume that these features would be provided by Kofax OOTB & I need to just map the scanned content to alfresco content repository for storing content & metadata as per defined content model.

OTHER TIPS

Use EpheSoft or Kofax for the scanning software. Both products have integrations with Alfresco were they can automatic recognize fields and map those to an Alfresco model.

After this process had been done you can search on these specific fields.

There are a number of options that you could explore but they all require that OCR is performed on the scanned content and the text that is extracted from the OCR needs to be stored in the PDF (if you're using PDFs) or it needs to be stored in Alfresco as either metadata or full text.

If you store the OCR text in the PDF, Alfresco will then be able to extract the text using its content transformers so long as the content type being used specifies that you will be indexing the full text of the content.

Now there are a number of options available to accomplish what you're after but to keep the solution close to the scanner, you will want to investigate a capture solution such as Ephesoft, which is used for intelligent document capture and processing. Other solutions are available (such as Kofax) or you can implement your own solution using Tesseract.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top