Preventing preimage attack on limited set of values

https://stackoverflow.com/questions/4302252

hash
sha256

29-09-2019
|

Question

I have asked about the cost of running a preimage attack on the hashes of social security numbers. The excellent answer I got was that the type of social security numbers only has 366,000,000 hashes, which would make it easy to run a preimage attack.

My question now is whether it is possible to avoid a preimage attack altogether. My scenario is that several clients need to store the social security number on a central server. The hashing must be consistent between the clients. The clients could communicate with online web services.

Solution

Your problem is similar to what must be done when using passwords. Passwords fit in human brains, and, as such, cannot be much difficult to guess.

There are two complementary ways to mitigate risks when using low-entropy secrets:

Use iterated/repeated hashing to make each "guess" more expensive for the attacker.
Use salts to prevent cost sharing. The attacker shall pay the full dictionary search attack for every single attacked password/SSN.

One way to make hashing more expensive is to hash the concatenation of n copies of the data, with a n as big as possible (depending on the computing power of the clients, and, ultimately, the patience of the user). For instance, for (dummy) SSN "123456789", use H(123456789123456789123456789...123456789). You would count n in millions here; on a basic PC, SHA-256 can easily process a hundred megabytes per second.

A salt is a piece of public data which is used along the data to hash (the SSN), and which is different for each user. A salt needs not be secret, but it should not be reused (or at least not often). Since SSN tend to be permanent (an individual has a unique SSN for his whole life), then you can use the user name as salt (this contrasts with passwords, where a user can change his password, and should use a new salt for every new password). Hence, if user Bob Smith has SSN 123456789, you would end up using: H("Bob Smith 123456789 Bob Smith 123456789 Bob Smith 123456789... Smith 123456789") with enough repetitions to make the process sufficiently slow.

Assuming you can make the user wait for one second (it is difficult to make a user wait for more) on a not-so-new computer, it can be expected that even a determined attacker will have trouble trying more than a few hundred SSN per second. The cost of cracking a single SSN will be counted in weeks, and, thanks to the use of the user name as a salt, the attacker will have no shortcut (e.g. salting defeats precomputed tables, including the much-hyped "rainbow tables").

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow