Question

Per this excellent eye-opener article written by a sec. expert , I become quite suspicious about the incoming strings - due to the fact that mysql_real_escape_string may be tricked...

The problem strictly stems from the multi-byte char sets such as GBK. If the user input is not a multi-byte input, then, no problems, no issues there as the mysql_real_escape_string will be good enough against SQL injection - provided that you do your basic data type validations properly.

I'm not saying multi-byte is evil... but if you do not have to deal with multibyte situations, then don't. Stick to utf-8 if that works for you and just stay in utf-8 al the time... But the question is how? Cause, it's the user who starts the process by sending you a non utf-8 string and perhaps a multi-byte string like GBK...

How do you make sure that you can successfully and reliably reject that user input then? From what I read/learn, it is impossible to know what char set the incoming user string is in. Then what?

In other words, how do you make sure that you are working with utf-8 user strings? I'm asking this because all the PHP filter/sanitization functions are all designed to deal with utf-8 input, they don't know how to deal with multi-bytes? As the article points, protection measurements becomes the cause of the failure.

oooh, please don't just say use prepared statements... aware of that excellent option allready.

Was it helpful?

Solution

This excellent eye-opener article has been written almost a decade ago and become a little obsolete.
Since then things improved a little.
PHP got a function to control mysql_real_escape_string() and make it really "taking into account the current character set of the connection" as documentation says.

The problem strictly stems not from the multi-byte char sets such as GBK but rather from the character set misinterpretation. So, you just have to tell mysql, what character set you are working with. And thus there is no point in detecting multibyte strings at all.

So, just set the proper character set using mysql_set_charset() and you will be safe.

Here is a little demo I wrote on the topic.

Also keep in mind that not every multibyte encoding is vulnerable. utf-8 is pretty safe. Otherwise we were suffering a zillion injections to-day.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top