Question

I have a lots of scans of text pages (black text on white background).

My usual approach is to clean those in Gimp using the Curves dialog using a pretty simple curve with only four points: 0,0 - 63,0 - 224,255, 255,255

This makes all the greyish text pitch black plus makes the text sharper and turns most of the whitish pixels pure white.

How can I achieve the same effect in a script using ImageMagick or some other Linux tool that runs completely from the command line?

-normalize or -contrast-stretch don't work because they operate with pixel counts. I need an operator which can make the colors 0-63 (grayscale) pitch black, everything above 224 pure white and the rest should be normalized.

Was it helpful?

Solution

The Color Modifications page shows many color manipulation algorithms by ImageMagick.

In this specific case, two algorithms are interesting:

-level gives you perfect black/white pixels near the ends of the curve and a linear distribution between.

The sigmoidal option creates a smoother curve between the extremes, which works better for color photos.

To get a similar result like in GIMP, you can try to apply one after the other (to make text and black areas really black).

In all cases, you will want to run -normalize first (or even -contrast-stretch to merge most of the noise) to make sure no black/white levels are wasted. Without this, the darkest color could be lighter than rgb(0,0,0) and the brightest color could be below pure white.

OTHER TIPS

[magick-users] Curves in ImageMagick

The first link in that archived message is a shell script that I think does what you're looking for.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top