Question

I need to generate md5 hash for ~40 millions strings with salt (~20 symbols),

md5 (string + salt)

check for collisions then save them into bd. Then when salt is changed i need to generate new hashes with new salt, check for collisions AND check for collisions with old hashes.

Is there some libs to help me here? Maybe bloom filter could help? Can i use HashMap to check for collisions?

Was it helpful?

Solution

I suggest to insert the generated md5 hashes in a database, and check every iteration of the loop. You can create an index in "hash value" column in order to do quicker searchs. The table is going to be very big, you can also divide the hashes in several tables (or even databases), with different first character, or first group of characters (hashes_aa, hashes_ab, etc.).

Are you changing the salt so often that this solution doesn't work for you?

Edit: collisions will be very, very rare, could you just skip the check?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top