OCR Tessearct Scanning Chunks of text not left to right iOS

https://stackoverflow.com/questions/21421472

04-10-2022
|

Question

I have a piece of paper that I want to scan, however the paper is not formatted in a way that scanning from left to right will work. As of now it will scan from left to right even if some text isn't "grouped" together.

How can I make Tesseract recognize text that is grouped and scan the grouped text together instead of left to right?

Image(Can't post images low rep)

http://cdn.designrshub.com/wp-content/uploads/2012/06/alignment.jpg

For example how would I make it recognize that each of those four paragraphs are its own "chunk" and scan them separately? Instead of scanning the first line in both of the top paragraphs then going down from there.

Solution

In Tesseract you can input the frame in an image which you need to scan. So If you set frame of a paragraph it will scan that that particular area only and will return text in that area. Thus you can separate scan each paragraph.

Go to Tesseract.mm file and add this code there.

- (void)setRect:(CGRect)rect {
     _tesseract->SetRectangle(rect.origin.x, rect.origin.y, rect.size.width, rect.size.height);
}

Go to Tesseract.h file and define method :

- (void)setRect:(CGRect)rect;

Then you can set frame before calling recognizedText

[tesseract setRect:CGRectMake(0, 0, 100, 100)];
[tesseract recognizedText];

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow