Pergunta

If I have a system where a hash is generated out of a total permutation of 1 million possibilities. If there's a 10% chance of a collision, should I worry about the generating algorithm running 5 times?

  • I have a system similar to jsfiddle, where a user can "save" a file on my server. Now I'm using '23456789abcdefghijkmnopqrstuvwxyz' which is 33 chars, and the file is 4 chars long, for a total of 33^4 = 1,185,921 possibilities.
  • The "filename" is generated randomly and if there's a collision it reruns to get another filename. Using a birthday paradox calculator I can see that after I have 500 entries I have a 10% chance of a collision.
  • What are the chances that I'll get a collision more than 5 times in a row? what about 4?
  • Is there any way to figure this out? Should I worry about it? What happens after 5000 entries?
  • Is there a program out there that can figure this out with any arbitary inputs?
Foi útil?

Solução

I don't think that the birthday paradox calculations apply. There's a difference between the odds of 500 random numbers out of 1185921 being all different and the odds of one new number being different once you have 500 known unique numbers.

If you have 500 assigned numbers and generate a new number at random, it will have odds of 500/1185921 of being a collision. With 500 names taken, the chances of 4 collisions in a row are (500/1185921)4 < 10-13. With 5000 existing file names, the odds of a new name being a collision are 5000/1185921, and the chance of 4 collisions in a row are < 10-9.

Outras dicas

My math is a little rusty so bear with me.
The chance of getting x collisions in a row is simply:

chance of collision ^ x;  

Where the chance of collision is:

entries/space (which is 500/1185921 or 0.04%).

You can see above that this will get worse with the more entries (and better with a bigger space).

Also note the birthday paradox is perhaps not quite what you want. The 10% chance is the chance that any two entries will have had a collision, not the chance of a collision for the next entry.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top