Combining 2 histogram data for a new Data in OpenCV

Question

Let me take it step by step:

Task: compare 2 images. Keep both if they are similar, else merge somehow. Feature Space: HSV histogram.

OPTION 1 Is it correct to merge histograms?

Yes, since you use histograms and not signatures you can just the bins of the two histograms and divide by two.

Excursion: If you want to merge additional images, you have to keep track of the number of already merged, so you know how to weight

Example: histogram with one bin, three pictures
with p1=2, p2=6, p3=10
merge p1,p2 to m_12: (2+6)/2 = 4
merge m_12 and p3: 
((weight * value m_12) + (weigth * value p3)) / 2
= ( (2/3 * 4) + (1/3 * 10) ) / 2
 = 6 [equal to (p1+p2+p3) / 3]

tl;dr yes you can merge them

OPTION 2 How about using the 2 image feature distribution, I compute a new distribution from the 2 histogram as a combined distribution of both images. ? Does this sound correct?

Yes, although i don't know immediately how you want to do it.

If you want to speed your program up, you should check out different distance measures (i only remember SQFD and Earth Movers Distance for signatures unfortunately atm). Often they have a fast but coarse lower bound. That can be used to get a good lower bound for the distance, so you can reduce your search space.

Increase in images also increase computational cost.

Check out hierarchical clustering to find data structures that are suited for large numbers of images.