Frage

I know this topic has been discussed quite extensively, as I've gone through and read more than 15 posts on the subject, but still can't find an answer to my question.

I'm looking for a function to sanitize data from a form. As absolutely NO HTML will be acceptable, how do I go about escaping ALL html entities so the user can absolutely not inject anything? I don't need a white list, as no input HTML is allowed.

Also, there's no need to run the mysql_real_escape_string, as I don't utilize a MySQL database. I use MongoDB. I'm just storing first name, last name, phone numbers, basic stuff. No HTML. But I still don't want a user to be able to input <script>whatever</script> for their first name, and when it's displayed back to them, it parses it.

I thought about HTML Purifier, and htmLAWED but they seem to be too much for what I'm trying to do. Do I just build a fancy preg_replace function?

War es hilfreich?

Lösung

There is no universal "make it safe" filter. Strings are only dangerous when placed into a specific context.

For example, if the context is a plain text document, you don't really have any worries.

htmlspecialchars is enough if the context is a text node(not within angle brackets). Specify the correct charset/encoding, which is the charset/encoding in the http headers sent by your server.

ok

   <p><?= htmlspecialchars($input, ENT_QUOTES, 'UTF-8'); ?></p>

But, if you need to output inside of angle brackets, making the context something like html attributes, like:

<p <?= htmlspecialchars($input, ENT_QUOTES, 'UTF-8'); ?>   ></p>
or
<p title="<?= htmlspecialchars($input, ENT_QUOTES, 'UTF-8'); ?>"   ></p>

The "make it safe" task, in many cases, becomes extremely difficult(legacy browsers have some absolutely bewildering bugs that defy common expectations of software developers). You would be foolish to not stand on the shoulders of giants and make use of something like htmlpurifier.

Andere Tipps

I'm no expert on such things, but couldn't you just str_replace the angle brackets?

I would say use preg_replace but you'd need to be careful of accents and other uncommon characters that can appear in a person's name.

Define sanitize: Do you want to escape the angle brackets or do you want to remove HTML tags?

To escape take look at

htmlentities() 

To remove have a look at

strip_tags()

One I like to use that just formats ALL HTML special chars in such a way that removes them from the flow of the HTML page is:

htmlspecialchars($string);

It's never let me down yet, solves having to use complex and slow replacment functions etc and also it means the user can use > in their username or comment etc without it being removed (i.e. a very valid username in the internet is >3).

What about looking into PHP's Data Filtering, http://php.net/manual/en/book.filter.php

Sanatize: http://php.net/manual/en/filter.filters.sanitize.php

If you really want a solid and safe library, check out OWASP's ESAPI for PHP

Don’t write your own security controls! Reinventing the wheel when it comes to developing security controls for every web application or web service leads to wasted time and massive security holes. The OWASP Enterprise Security API (ESAPI) Toolkits help software developers guard against security‐related design and implementation flaws.

Use php 5.3's filter_input http://php.net/manual/en/function.filter-input.php

$string = filter_input(INPUT_POST, 'string', FILTER_SANITIZE_SPECIAL_CHARS);

This is pretty much like $_POST['string'] but with built in cleaner.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top