OCR: How to find the right ColorMatrix to define new colors?
Question
I'm stuck right now with defining the dimension of each line. The list I want to scrape has various colors in it, and what disturbs me the most a selection:
As you can see the picture I try to analyze got a white background with green text. The selection background is grey with black text. And every second line has a slightly greyer background, but I managed to manipulate the contrast with a ColorMatrix.
Just for reference, I do have some other ColorMatrizes like Greyscale, Negative, SetContrast, SetBrightness and so on.
My method, which is searching the lines does work good with the most part of the picture, but the selection brakes it.
So now I'm stuck and don't know what to do. I googled for an hour, but didn't find a solution.
I thought, that maybe I can transform the background grey from the selection to white without affecting the text and greyscale the rest of the picture. But I can't find a ColorMatrix which does the job.
Do you know one or got a better solution?
Solution
Why use a color-matrix at all?
It works (at least for your specific example) much easier with ImageMagick's -threshold
operation:
convert \
http://img18.imageshack.us/img18/210/lobbymd9.jpg \
-threshold 50% \
result.jpg
Visual Result:
=>
Thresholding basically leaves over only 2 values (zero or maximum) for each color. Every value below the threshold gets set to 0, values above the threshold get set to 255 (or 65535 if working at 16-bit depth). The end effect is a pure black+white picture.