Distance between N images: incrementally! (same crop, but re-compressed/adjusted)

https://stackoverflow.com/questions/8647200

06-04-2021
|

Question

I'm trying to figure out a geo-hashing method for images. It is hard because the space of possible images is of much higher dimensionality than lat/lng. (geo-hashing converts a location to a string where the string progressively refines the location)

So, what I need is something that:

INPUT: A list of JPG or PNG images on disk
OUTPUT: For each image a string WHERE the longer the string prefix in common between any two images, the higher chance that the two images are the same.

It doesn't need to be perfect, and it doesn't need to handle extreme cases, like cropped images or heavily adjusted images. It is intended for multiple copies of the same image at different resolutions and compression levels.

I can't use:

File or image-data hashing, because even a teeny change between two images makes a completely different hash and you don't get any proximity
Image subtraction, because it won't be a N-to-N comparison.

I've read in other answers to try wavelet compression or a laplacian/gaussian pyramid, but I'm not sure how to implement in Java or Python. However, I have made progress!

Resize to 32x32 using http://today.java.net/pub/a/today/2007/04/03/perils-of-image-getscaledinstance.html to not discard data. Ok that everything gets turned into a square.
Create a pyramid of successively smaller thumbnails all the way down to 2x2.
In the 2x2, encode up a string of "is the next pixel brighter than the current? If so, 1, else 0" (This throws away all hue and saturation, I may want to use hue somehow)
Encode successive binary numbers from the 8x8 and 32x32 pyramids
Convert the big binary number to some higher radix representation, like Base62.

This seems to work well! Minor differences from compression or color balancing aren't enough to change a "is the left side of this area brighter than the right side". However, I think I'm re-inventing the wheel, some sort of progressive encoding might be better? SIFT and other feature-detection is overkill, I don't need to be able to handle cropping or rotation.

Solution 3

Getting good results from the following:

Scale down (using good scaling that doesn't discard information) to three images: 1x7 7x1 and a 6x6 image.

Convert all to grayscale.

For each image, do the "is next pixel brighter?'1':'0' encoding, output as base62.

Those outputs become the values for three columns. Nice successively refined differencing, packed into 2 chars, 2 chars, and 6 chars. True, discards all color, but still good!

OTHER TIPS

How about this. The hash string is made up of groups of three characters, representing red green and blue:

{R0, G0, B0}, {R1, G1, B1}, {R2, G2, B2}, ...

For each group, the image is resized to a 2^N by 2^N square. Then, the value is the sum (mod, say, 255, or whatever your encoding is) of the differences in intensity of each of the colours over some walk through the pixels.

So as a small example, to compute e.g group 1 (2x2 image) one might use the following code (I have only bothered with the red pixel)

int rSum = 0;
int rLast = 0;
for (int i=0; i<2; i++) {
  for (int j=0; j<2; j++) {
    rSum += Math.abs(image[i][j].r - rLast);
    rLast = image[i][j].r;
  }
}
rSum %= 255;

I believe this has the property that similar images should be close to each other, both for each character in the hash and in terms of successive characters in the hash.

Although for higher values of N the chance of a collision gets higher (many images will have the the same sum-of-difference values for R G and B intensities across them), each successive iteration should reveal new information about the image that was not tested with the previous iteration.

Could be fairly computationally expensive, but you have the advantage (which I infer from your question you might desire) that you can end the computation of the hash as soon as a negative is detected within a certain threshold.

Just an idea, let me know if I wasn't clear!

What you're describing seems to me to be an example of Locally Sensitive Hashing applied to the image similarity problem.

I'm not sure that the common prefix property is desirable for a good hash function. I would expect a good hash function to have two properties:

1) Good localization - for images I1 and I2 ,norm(Hash(I1)-Hash(I2)) should represent the visually percepted simiarity of I1 and I2.

2) Good compression - The high-dimension image data should be embedded in the low-dimension space of hash functions in the most discriminative way.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow