Tesseract - Entire line output

https://stackoverflow.com/questions/22687127

ocr
tesseract

22-06-2023
|

Question

I am trying to OCR few tables using Tesseract. These tables have following format:

Item One name                       Item One category
(Item description if any)

Item Two name                       Item Two category
(Item description if any)

There is some space between the name and category. The output produced is like this

Item One name
(Item description if any)

Item Two name
(Item description if any)


Item One category

Item Two category

Is there a way that I can produce output for the entire line and not get this column wise output one below the other?

I am running Tesseract through simple command line:

tesseract ~/Desktop/imagename.jpg out

Solution

Try with a different page segmentation mode (PSM), such as 4 or 6.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow