Question

I want to compare 2 images and if they are similar than I keep the 2 images. I compute HSV histogram for each image and compare the distance between the histogram.

Now when the 3rd image is obtained I have to compare it to image 1 and image 2 (already stored as one similar type image).

The problem of comparing like above is that increase in images also increase computational cost.

So what i want to do its if 2 images are similar than I want to cluster there features as one hence the features of similar images in future will be compare to the clustered features.

OPTION 1 How if I merge the 2 histogram will it be correct? I dont think so but I am not sure?

OPTION 2 How about using the 2 image feature distribution, I compute a new distribution from the 2 histogram as a combined distribution of both images. ? Does this sound correct?

Was it helpful?

Solution

Let me take it step by step:

Task: compare 2 images. Keep both if they are similar, else merge somehow. Feature Space: HSV histogram.

OPTION 1 Is it correct to merge histograms?

Yes, since you use histograms and not signatures you can just the bins of the two histograms and divide by two.

Excursion: If you want to merge additional images, you have to keep track of the number of already merged, so you know how to weight

Example: histogram with one bin, three pictures
with p1=2, p2=6, p3=10
merge p1,p2 to m_12: (2+6)/2 = 4
merge m_12 and p3: 
((weight * value m_12) + (weigth * value p3)) / 2
= ( (2/3 * 4) + (1/3 * 10) ) / 2
 = 6 [equal to (p1+p2+p3) / 3]

tl;dr yes you can merge them

OPTION 2 How about using the 2 image feature distribution, I compute a new distribution from the 2 histogram as a combined distribution of both images. ? Does this sound correct?

Yes, although i don't know immediately how you want to do it.

If you want to speed your program up, you should check out different distance measures (i only remember SQFD and Earth Movers Distance for signatures unfortunately atm). Often they have a fast but coarse lower bound. That can be used to get a good lower bound for the distance, so you can reduce your search space.

Increase in images also increase computational cost.

Check out hierarchical clustering to find data structures that are suited for large numbers of images.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top