Train tesseract 3 to get table of letters

Question

First, I suggest you preprocess your image, for example making the dark parts darker, blur it a little. Feel free to experiment until Tesseract stops seeing letters in the filled-in squares.

Second, you have two options:

One, you can enable hOCR output and try to parse the layout of the scanned letters yourself. hOCR is a subset of HTML and it contains coordinates of all recognized words. Try figuring out where the rows and columns are.
Alternatively, try making Tesseract recognise the layout properly, not rotated 90°.

Anyway, this is what I did:

1. I ran the image through ImageMagick:

$ convert CDZjN.png -deskew 40% -contrast-stretch 7%x10% -filter lanczos -resize 250% ooo.png

2. I created a config file t.conf for Tesseract, disabling vertical text detection and English dictionary:

textord_tabfind_vertical_text 0 load_system_dawg 0 load_freq_dawg 0 load_punc_dawg 0 load_number_dawg 0 load_unambig_dawg 0 load_bigram_dawg 0 load_fixed_length_dawgs 0

3. I simply ran it:

$ tesseract ooo.png ooo t.conf ; cat ooo.txt Tesseract Open Source OCR Engine v3.02 with Leptonica 01ABC-E 26ABCDE 02A CDE 27ABCDE o3 BCDE 28ABCDE o4 BCDE 29ABCDE o5 BCDE 30ABCDE 06ABCD. 31ABCDE 07A-CDE 32ABCDE 08ABC.E 33ABCDE o9 BCDE 34ABCDE 10A CDE 35ABCDE 11ABCD 36ABCDE 12ABC E 37ABCDE 13ABC E 38ABCDE 14ABCD 39ABCDE 15 BCDE 40ABCDE 1s BCDE 41ABCDE 17 BCDE 42ABCDE 18ABCD_ 43ABCDE 19AB DE 44ABCDE 20AB DE 45ABCDE 21ABCDE 46ABCDE 22ABCDE 47ABCDE 23ABCDE 48ABCDE 24ABCDE 49ABCDE 25ABCDE 50ABCDE

Not perfect, but passable.