Pergunta

I'm using the SHA-256 algorithm to detect identical images in a database. Because we use a lot of different image formats I don't want to compute the hash directly on the file. Instead I want to extract the pixel data and compute the hash on that.

Unfortunately I'm getting a lot of random collisions: 68 images which do not have identical bytes using the same pixel extraction (below) out of 6000 images hash to the same value. I feel like this is an insane number of collisions. Additionally I dumped the bytes I compute from the pixel data to a file, then tried:

echo -n [byteDumpFile] | sha256sum

which resulted in different hash values for the dumped images, which leads me to believe I'm doing something wrong when I use MessageDigest.

Here is how I get the pixel data:

    imageBytes = new byte[4 * width * height];
    for (int y = 0; y < height; y++)
    {
        for (int x = 0; x < width; x++)
        {

            // grab color information
            int argb = image.getRGB(x, y);

            // a,r,g,b ordered bytes per this pixel. the values are always 0-255 so the byte cast is safe
            int offset = y * width;
            int pushX = x * 4;
            imageBytes[pushX + offset] = (byte) ((argb >> 24) & 0xff);
            imageBytes[pushX + 1 + offset] = (byte) ((argb >> 16) & 0xff);
            imageBytes[pushX + 2 + offset] = (byte) ((argb >> 8) & 0xff);
            imageBytes[pushX + 3 + offset] = (byte) (argb & 0xff);

        }
    }

Then I compute the hash using the MessageDigest class:

    MessageDigest digest = MessageDigest.getInstance("SHA-256");
    digest.reset();


    for (int i = 0; i < imageBytes.length; i++)
    {
        digest.update(imageBytes[i]);
    }

    String hashString = new String(encodeHex(digest.digest()));

where encodeHex is just:

   private static String encodeHex(byte data[])
    {
        StringBuilder hex = new StringBuilder(2 * data.length);
        for (byte b : data)
        {
            hex.append(HEXES.charAt((b & 0xF0) >> 4)).append(HEXES.charAt((b & 0x0F)));
        }

    return hex.toString();
}
Foi útil?

Solução

I think offset is being calculated wrong. It should be:

int offset = y * width * 4;

A better way to create the imageBytes might be a ByteBuffer; it allows you to simply put each byte sequentially without calculating the index. In addition, it can be used directly with MessageDigest.

Outras dicas

try

digest.update(imageBytes);

I came up with this. Based on the comments from above:

private String calculateHash(BufferedImage img) throws NoSuchAlgorithmException {
    final int width = img.getWidth();
    final int height = img.getHeight();
    final ByteBuffer byteBuffer = ByteBuffer.allocate(4 * width * height);
    for (int y = 0; y < height; y++)
    {
        for (int x = 0; x < width; x++)
        {
            // grab color information
            int argb = img.getRGB(x, y);

            // a,r,g,b ordered bytes per this pixel. the values are always 0-255 so the byte cast is safe
            byteBuffer.put((byte) ((argb >> 24) & 0xff));
            byteBuffer.put((byte) ((argb >> 16) & 0xff));
            byteBuffer.put((byte) ((argb >> 8) & 0xff));
            byteBuffer.put((byte) (argb & 0xff));
        }
    }


    MessageDigest digest = MessageDigest.getInstance("SHA-256");
    digest.reset();

    byte[] hashBytes = digest.digest(byteBuffer.array());
    return Base64Utils.encodeToString(hashBytes);
}
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top