MD5 hash of an ALAssetRepresentation image does not match a duplicated ALAssetRepresentation image's hash

StackOverflow https://stackoverflow.com/questions/7478085

Question

I am using the following to create an NSData object from an ALAssetRepresentation to both export an image file as well as create an md5 hash from:

- (NSUInteger)getBytes:(uint8_t *)buffer fromOffset:(long long)offset length:(NSUInteger)length error:(NSError **)error;

when I re-add the exported file and perform the same operation, the file's md5 hash is different.

When I create the NSData objects using UIImagePNGRepresentation() and perform the above operations, the md5 hashes match.

I am trying to avoid using UIImagePNGRepresentation() since it is considerably more expensive for what I am doing than the getsBytes method.

Any ideas would be appreciated!

Was it helpful?

Solution

The difference is that UIImagePNGRepresentation() returns only the image data and ignores the file headers.

The problem is that you're probably starting from offset 0. This will read the file headers which will mess up your hashing (since they might be the same images but have a different creation date).

Instead, here's an example of reading 1K from the middle of the file. For an image, this will only read about 340 pixels so you might want to increase the comparison size to about 20K or more if you're comparing images for duplicates for example.

The code would be as such:

    #import <CommonCrypto/CommonCrypto.h>
    #define HASH_DATA_SIZE  1024  // Read 1K of data to be hashed

    ...

    ALAssetRepresentation *rep = [anAsset defaultRepresentation];
    Byte *buffer = (Byte *) malloc(rep.size);
    long long offset = rep.size / 2; // begin from the middle of the file
    NSUInteger buffered = [rep getBytes:buffer fromOffset:offset length:HASH_DATA_SIZE error:nil];

    if (buffered > 0)
    {
        NSData *data = [NSData dataWithBytesNoCopy:buffer length:buffered freeWhenDone:YES]

        unsigned char result[CC_MD5_DIGEST_LENGTH];

        CC_MD5([data bytes], [data length], result);
        NSString *hash = [NSString stringWithFormat:
                        @"%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X",
                        result[0], result[1], result[2], result[3],
                        result[4], result[5], result[6], result[7],
                        result[8], result[9], result[10], result[11],
                        result[12], result[13], result[14], result[15]
                        ];

        NSLog(@"Hash for image is %@", hash);
    }

I tried this for about 4000 photos. The average hashing time for the full image when using UIImagePNGRepresentation() was 0.008 seconds where as it dropped to about 0.00008 seconds when comparing just 1K of each image read from the middle of the file.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top