Frage

I've seen some questions on deciding the best OCR result given output from different engines, and the answer is typically "choose the best engine". I want, however, to capture several frames of text images, with possible temporary occlusions or temporary failures. I'm using tesseract-ocr with python-tesseract.

Considering the OCR outputs of the last N frames, I want to decide what is the best result (line by line, for simplicity).

For example, for N=3, we could use a median filtering:

ABXD
XBCX
AXCD

When there are 2 out of 3 equal characters, the majority will win, so the result would be ABCD. However, that's not so easy with different string sizes. If I expect a given size M (if scanning a price table, the rows are typically XX.XX), I can always penalize on strings bigger than M.

If we were talking numbers, a median filtering would work quite well (simple background subtraction in computer vision), or some least mean squares adaptive filtering. There's also the problem of similar characters: l and 1 can be very similar, depending on the font.

I was also thinking of using string distances between each string. For example, choose the string with the smallest sum of distances with the others.

Has anyone addressed this kind of problem before? Is there any known algorithm for this kind of problem that I should know?

War es hilfreich?

Lösung

This problem is called multiple sequence alignment and you can read about it here

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top