I implemented some mututal information and Kullback-Leibler distance to find similarity in Facades. It worked really well, how it works is explaind here:
Image-based Procedural Modeling of Facades
The whole steps are explained in the paper. But they are not for similarity of Images they are for the symmetrie of image parts. But maybe it works well also for Image comparison. Well it is just and idea maybe it works you should try. One think where i really see a problem is the rotation. I don't think this procedure is rotation invariant. Maybe you should look for some Visual Information Retrieval techniques, for your problem.
First you have to compute the mutual Information. For thate you create an accumulator array of the size of 256 x 256. Why that size? First for every gray color so the joint distribution and then for the marginal distribution.
for(int x = 0; x < width; x++)
for(int y = 0; y < height; y++)
{
int value1 = image1[y *width + x];
int value2 = image2[y * width + x];
//so first the joint distribution
distributionTable[value1][value2]++;
// and now the marginal distribution
distributionTable[value1][256]++;
distributionTable[256][value2]++;
}
Now you own the distribution table, and now you can compute the Kullback-Leibler distance.
for(int x = 0; x < width; x++)
for(int y = 0; y < height; y++)
{
int value1 = image1[y *width + x];
int value2= image2[y * width + x];
double ab = distributionTable[value1][value2] / size;
double a = distributionTable[value1][256] / size;
double b = distributionTable[256][value2] / size;
//Kullback-Leibler distance
sum += ab * Math.log(ab / (a * b));
}
A smaller sum says you that the similiarity/symmetrie between the two Images/Regions is very high. Should work well if the Image just have a brightness difference. Maybe there are other distances which are inveriant against rotation.
Maybe you shold try to to use SURF, SIFT or something like this. Then you can match the feature points. More higher the match results are so higher is the similarity. I think this is a better approach, because you don't have to care about scale, brightness and rotation difference. And it is also fast implemented with OpenCV