Regarding your question why tesseract delivers better results when using a binary image instead of a gray image as input for tesseract:
Tesseract will do an internal binarization of the gray scale image with various methods (haven't figured out right know what method for binarization is used exactly, some times local adaptiv threshold, some times global OTSU threshold is mentioned in the internet). Sure is, that tesseract performs character recognition on a binary image and that the preprocessing of tesseract can still fail at specific problems (hasn't got good layout analyzes for example). So if you do the preprocessing part yourself and give tesseract as input image only a binary image with text and disable all layout analyzes in tesseract you could achieve better results than letting tesseract doing all for you. Since it is an open source free utility, it has some known drawbacks, which has to be accepted.
If you use tesseract as command line tool, this thread is very useful for the parameter. tesseract command line page segmantation
If you use the source code of tesseract in developing your own C++ Code, you have to initialze tesseract with the correct parameter. Parameter are described here at the tesseract API side. tesseract API