Pergunta

I have made a website that generates hashes from users inputting plain text, a user can enter a word/ phrase and select from MD5 or SHA1 (or both). The site then takes this input and converts it into MD5 & SHA1 and stores it in a database, then returns it to the user. Users can also enter hashes into a search bar and if the database has the hash it will return the original word, the aim is to generate crowd-sourced hash tables whilst providing a benefit to users. I have got the main functionality working but I have a dilemma about what I should sanitize, I want users to be able to input special characters as it will improve the chances of the search function returning a result, any advice would be appreciated, thanks.

Foi útil?

Solução

You shouldn't have to sanitise anything if it's just going to be hashed, since hashing functions are generally not vulnerable to injection attacks.

You should, of course, sanitise all your database inputs (for both insertion and searching). However, unless you're using an antiquated database API that doesn't support parameterised queries (e.g. the mysql_* functions; avoid these) this will be done automatically.

Outras dicas

I see no reason for your application to sanitize anything. All you're doing with your users' input is feeding it to a cryptographic hash function, and those functions will happily accept any byte sequences.

Of course, if you're displaying the input string on the result page, you should escape it with htmlspecialchars() before embedding it in HTML code. Similarly, if you're including it as a parameter in a URL you should escape it with urlencode(), and if you're storing it in an SQL database, you should escape it with the appropriate escaping function for your database driver (e.g. mysqli::escape_string()), or just use prepared SQL statements with placeholders.

Also note that cryptographic hash functions operate on byte strings, not on character strings. This means that, especially for text containing non-ASCII characters, the hash value will depend on the character encoding used to encode it into bytes. For Unicode text, it may also depend on the normalization form used. UTF-8 (with normalization form C or D, or just whatever the user's browser sends) may be a reasonably common choice these days, but if you want to be helpful, you may want to offer your users a choice of different encodings.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top