Question

PHP uses "magic quotes" by default but has gotten a lot of flak for it. I understand it will disable it in next major version of PHP.

While the arguments against it makes sense, what I don't get it is why not just use the HTML entities to represent quotes instead of stripping and removing slashes? After all, a VAST majority of mySQL is used for outputting to web browsers?

For example, ' is used instead of ' and it won't affect the database at all.

Another question, why can't PHP just have configurations set up for each version of PHP with this tag <?php4 or <?php5 so appropriate interpreters can be loaded for those versions?

Just curious. :)

Was it helpful?

Solution

Here's a good reason, mostly in response to your own posted answer: Using htmlspecialchars() or htmlentities() does not make your SQL query safe. That's what mysql_real_escape_string() is for.

You seem to be making the assumption that it's only the single and double quote characters that pose a problem. MySQL queries are actually vulnerable to the \x00, \n, \r, \, ', " and \x1a characters in your data. If you are not using prepared statements or mysql_real_escape_string(), then you have an SQL injection vulnerability.

htmlspecialchars() and htmlentities() do not convert all of these characters, ergo you cannot make your query safe by using these functions. To that end, addslashes() does not make your query safe either!

Other smaller downsides include what the other posters have already mentioned about MySQL not always being used for web content, as well as the fact that you are increasing the amount of storage and index space needed for your data (consider one byte of storage for a quote character, versus six or more bytes of storage for its entity form).

OTHER TIPS

Putting &#039; into a string column in a database would be fine, if all you use database content for is outputting to a web page. But that's not true.

It's better to escape output at the time you output it. That's the only time you know for sure that the output is going to a web page -- not a log file, an email, or other destination.

PS: PHP already turns magic quotes off by default in the standard php.ini file. It's deprecated in PHP 5.3, and it will be removed from the language entirely in PHP 6.0.

I will reply to your first question only.

Validation of input is a wrong approach anyway, because it's not input that matters, the problem is where it's used. PHP can't assume that all input to a MySQL query would be output to a context where a HTML Entity would make sense.

It's nice to see that magic_quotes is going; it's the cause of a lot of security issues with PHP, and it's nice to see them taking a new approach :)

You'll do yourself a big favour if you reframe your validation approaches to validate on OUTPUT, for the context you are working in. Only you, as the programmer, can know this.

The reason that MySQL doesn't convert ' to &#039; is because &#039; is not '. If you want to convert your data for output, you should be doing that at the view layer, not in your database. It's really not very hard to just call htmlentities before/when you echo.

Thanks everyone. I had to REALLY think what you meant and the implications it may have if I change the quotes to HTML entities instead of adding slashes to them but again, isn't that actually changing the output/input too?

I cannot think of a reason why we CANNOT or SHOULDN'T use HTML entities for mySQL as long as we make it clear that all data is encoded using HTML entities. After all, my argument is based on a fact that the majority of mySQL is used for outputting to HTML browsers and also the fact that ' and " and / can seriously harm mySQL databases. So, isn't it actually SAFER to encode ' and " and / as HTML entities before sending them as INSERT queries? Also, we're going XML so why waste time writing htmlentities and stripslashes and addslashes when accessing data that's ALREADY encoded in HTML entities?

You can't just convert ' to &#039;. Think about it: what happens when you want to store the string "&#039;"? If you store &#039; then when you load the page it will display ' and not &#039;.

So now you have to convert ALL HTML entities, not just quotes. Then you start getting into all sorts of weird conversion problems. The simplest solution is to just store the real data in the database, then you can display it how you like. You might want to use the actual quotes - in most cases " and ' don't do any harm outside of the tag brackets.

Sometimes you may want to store actual HTML in a field and display it raw (as long as it's checked and sanitized on its way in/out.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top