Extract black objects from color background

Question 1

After inspecting the images, I decided that a robust threshold will be more simple than anything. For example, looking at the C=40%, M=40% photo, I first inverted the intensities so black (the signal) will be white just using

im=(abs(255-im));

we can inspect its RGB histograms using this :

hist(reshape(single(im),[],3),min(single(im(:))):max(single(im(:)))); 
colormap([1 0 0; 0 1 0; 0 0 1]);

enter image description here

so we see that there is a large contribution to some middle intensity whereas the "signal" which is now white, is mostly separated to higher value. I then applied a simple thresholds as follows:

thr = @(d) (max([min(max(d,[],1))  min(max(d,[],2))])) ;
for n=1:size(im,3)
    imt(:,:,n)=im(:,:,n).*uint8(im(:,:,n)>1.1*thr(im(:,:,n)));
end

imt=rgb2gray(imt);

and got rid of objects smaller than some typical area size

min_dot_area=20;
bw=bwareaopen(imt>0,min_dot_area);
imagesc(bw); 
colormap(flipud(bone));

here's the result together with the original image: enter image description here

The origin of this threshold is from this code I wrote that assumed sparse signals in the form of 2-D peaks or blobs in a noisy background. By sparse I meant that there's no pile up of peaks. In that case, when projecting max(image) on the x or y axis (by (max(im,[],1) or (max(im,[],1) you get a good measure of the background. That is because you take the minimal intensity of the max(im) vector.

If you want to look at this differently you can look at the histogram of the intensities of the image. The background is supposed to be a normal distribution of some kind around some intensity, the signal should be higher than that intensity, but with much lower # of occurrences. By finding max(im) of one of the axes (x or y) you discover what was the maximal noise level.

You'll see that the threshold picks that point in the histogram where there are still some noise above it, but ALL the signal is above it too. that's why I adjusted it to be 1.1*thr. Last, there are many fancier ways to obtain a robust threshold, this is a quick and dirty way that in my view is good enough...

Question 2

I would convert the image to the HSV colour space and then use the Value channel. This basically separates colour and brightness information.

This is the 50% cyan image

enter image description here

Then you can just do a simple threshold to isolate the dots.

enter image description here

I just did this very quickly and im sure you could get better results. Maybe find contours in the image and then remove any contours with a small area, to filter any remaining noise.

Question 3

Thanks to everyone for posting his answer! After some search and attempt, I also come up with an adaptive method to extract these black dots from the color background. It seems that considering only the brightness could not solve the problem perfectly. Therefore natan's method which calculates and analyzes the RGB histogram is more robust. Unfortunately, I still cannot obtain a robust threshold to extract the black dots in other color samples, because things are getting more and more unpredictable when we add deeper color (e.g. Cyan > 60) or mix two colors together (e.g. Cyan = 50, Magenta = 50).

One day, I google "extract color" and TinEye's color extraction and color thief inspire me. Both of them are very cool application and the image processed by the former website is exactly what I want. So I determine to implement a similar stuff on my own. The algorithm I used here is k-means clustering. And some other related key words to search may be color palette, color quantation and getting dominant color.

I firstly apply Gaussian filter to smooth the image.

GaussianBlur(img, img, Size(5, 5), 0, 0);

OpenCV has kmeans function and it saves me a lot of time on coding. I modify this code.

// Input data should be float32 
Mat samples(img.rows * img.cols, 3, CV_32F);
for (int i = 0; i < img.rows; i++) {
    for (int j = 0; j < img.cols; j++) {
        for (int z = 0; z < 3; z++) {
            samples.at<float>(i + j * img.rows, z) = img.at<Vec3b>(i, j)[z];
        }
    }
}

// Select the number of clusters
int clusterCount = 4;
Mat labels;
int attempts = 1;
Mat centers;
kmeans(samples, clusterCount, labels, TermCriteria(CV_TERMCRIT_ITER|CV_TERMCRIT_EPS, 10, 0.1), attempts, KMEANS_PP_CENTERS, centers);

// Draw clustered result
Mat cluster(img.size(), img.type());
for (int i = 0; i < img.rows; i++) {
     for(int j = 0; j < img.cols; j++) { 
        int cluster_idx = labels.at<int>(i + j * img.rows, 0);
        cluster.at<Vec3b>(i, j)[0] = centers.at<float>(cluster_idx, 0);
        cluster.at<Vec3b>(i, j)[1] = centers.at<float>(cluster_idx, 1);
        cluster.at<Vec3b>(i, j)[2] = centers.at<float>(cluster_idx, 2);
    }
}
imshow("clustered image", cluster); 
// Check centers' RGB value
cout << centers;

After clustering, I convert the result to grayscale and find the darkest color which is more likely to be the color of the black dots.

// Find the minimum value
cvtColor(cluster, cluster, CV_RGB2GRAY);
Mat dot = Mat::zeros(img.size(), CV_8UC1);
cluster.copyTo(dot);
int minVal = (int)dot.at<uchar>(dot.cols / 2, dot.rows / 2);
for (int i = 0; i < dot.rows; i += 3) {
    for (int j = 0; j < dot.cols; j += 3) {
        if ((int)dot.at<uchar>(i, j) < minVal) {
            minVal = (int)dot.at<uchar>(i, j);
        }
    }
}
inRange(dot, minVal - 5 , minVal + 5, dot);
imshow("dot", dot);

Let's test two images.

(clusterCount = 4)

enter image description here

(clusterCount = 5)

enter image description here

One shortcoming of the k-means clustering is one fixed clusterCount cannot be applied to every image. Also clustering is not so fast for larger images. That's the issue annoys me a lot. My dirty method for better real time performance (on iPhone) is to crop 1/16 of the image and cluster the smaller area. Then compare all the pixels in the original image with each cluster center, and pick the pixel that are the nearest to the "black" color. I simply calculate euclidean distance between two RGB colors.

Question 4

A simple method is to just threshold all the pixels. Here is this idea expressed in pseudo code.

for each pixel in image
    if brightness < THRESHOLD
        pixel = BLACK
    else
        pixel = WHITE

Or if you're always dealing with cyan, magenta and yellow backgrounds then maybe you might get better results with the criteria

if pixel.r < THRESHOLD and pixel.g < THRESHOLD and pixel.b < THRESHOLD

This method will only give good results for easy images where nothing except the black dots is too dark.

You can experiment with the value of THRESHOLD to find a good value for your images.

Question 5

I suggest to convert to some chroma-based color space, like LCH, and adjust simultaneous thresholds on lightness and chroma. Here is the result mask for L < 50 & C < 25 for the input image:

enter image description here

Seems like you need adaptive thresholds since different values work best for different areas of the image.

You may also use HSV or HSL as a color space, but they are less perceptually uniform than LCH, derived from Lab.