Frage

I'm still learning PHP and SQL. I'm trying to create a simple content management system for a website's list of events. All of the input form fields are either Text areas or Text boxes (yes, I want them that way), and I want to leave the user the ability to add HTML links in addition to text in these fields. The following functions seem a good place to start with sanitizing the input I get from the user, but since I'm new to this I wanted to get the opinions of more knowledgeable developers. What more should I be doing to try to protect the database?

P.S. Thanks to CSS-Tricks for these functions.

function cleanInput($input) {

    $search = array(
         '@<script[^>]*?>.*?</script>@si',   // Strip out javascript
         '@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly
         '@<![\s\S]*?--[ \t\n\r]*>@'         // Strip multi-line comments
    );

    $output = preg_replace($search, '', $input);
    return $output;
}

function sanitize($input) {
    if (is_array($input)) {
       foreach($input as $var=>$val) {
          $output[$var] = sanitize($val);
       }
    }
    else {
       if (get_magic_quotes_gpc()) {
          $input = stripslashes($input);
       }
       $input  = cleanInput($input);
   $output = htmlentities($output);
       $output = mysql_real_escape_string($input);
     }
     return $output;
}
War es hilfreich?

Lösung

Quite easily:

$testinput = "<script>alert('p0wned');</script >\n
    <a href='http://example.org' onclick=\"alert('p0Wned again!)\">Click me!</a>";

var_export(cleanInput($testinput));

Also, htmlescape is almost always the wrong thing to use--it will mangle utf8 input. Also, you should not be storing html-escaped data in your DB. I'm not even sure why you use it here at all--won't you have to unescape the html to display it?

However you are going about this the wrong way.

  1. Do not parse/sanitize html with regexes. Use a real html parser such as DOMDocument or html5lib or even tidylib. Unfortunately PHP doesn't seem to have anything as wonderful as Bleach on Python, so you will have to roll your own. An XSLT stylesheet with a whitelist seems like it might be a good way to handle this particular sanitization condition. Update: another user pointed out HTML Purifier, which is also a whitelist-based html sanitizer. I've never used it but it looks like "Bleach in PHP". You should definitely investigate.
  2. Prefer escaping to sanitization. PHP culture has an obsession with sanitization which is really just plain wrong. Escape data at the boundaries of your application (output and database). In the core of your application your data should be in its native form without any escaping.

A general outline of processing is like so:

  1. Input

    1. Turn off magic quotes in your php settings. Include code at the top of your app to fail hard if it's on: if (get_magic_quotes_gpc()) die ('TURN OFF MAGIC QUOTES!!!!');
    2. Validate and normalize/sanitize specific fields of your input according to the expected type of each field. For example, a "dollar amount" has different validation criteria than a whitelisted html fragment field. (Probably you should find and use a validation library.)
    3. If there are errors, send them back to the user with an appropriate HTTP response code.
    4. Save your data to the database using a database library that supports parameter binding, such as PDO library with prepared statements. This way you do not need to remember to escape data by hand.
    5. On success, redirect (code 303) to a page displaying the created or modified record.
  2. Output

    1. Retrieve data from the database.
    2. Feed the data to a template which is PHP code that only deals with html display of data structures. It should not know details of how that data is retrieved or contain any "application-driving" behavior. Treat a template like a function that accepts a data structure and returns a string.
    3. Escape your data inside your template. Individual fields of your data will need to be escaped differently. You almost always need to run it through htmlspecialchars before output; the only case you would not do that is when the data you need to display is already html (i.e. your whitelist-sanitized html fields). Define a helper function like this and use it in your templates:

      function h($str) {
          return htmlspecialchars($str, ENT_QUOTES, 'utf-8');
      }
      

      Even better, try to use a template library that automatically escapes strings for you and that requires you to turn off escaping explicitly. (The common case should be simple to avoid errors, and having to escape is the common case!)

    4. Your html page is the string returned from your template. You may now display it to the user.

Andere Tipps

While you don't have to sanitize your own string data that you display in the browser or store in a database, you must sanitize all user input that your website obtains through INPUT elements, TEXTAREA elements, from the keyboard via JavaScript/DOM Events, from uploaded files, and from all the other sources I've forgotten to list.

While database sanitizing is well-documented, and partially enforced in the latest version of server-side languages like PHP, there is still no universally-accepted way to sanitize the other sources of user input that I listed.

My own contribution is this little piece of PHP code, that allows any user input to be displayed on a web page or sent to another web page through GET or POST controls and fields in FORM elements or through Ajax without opening your website to malicious use:

function HTMLToSafeHTML($Str)
    {
    return str_replace(['&','<','>','"','\''], ['&amp;','&lt;','&gt;','&quot;','&apos;'], $Str);
    } // HTMLToSafeHTML

To use this function correctly, you must identify and track all user input, then call this function before displaying or otherwise allowing the user input to be interpreted as part of Web processing or programming. Identifying user input allows you to call this function only once. Calling it more than once will display its hard-to-read encoding, which is not useful as text.

For example, if you want to display an error message that shows some user input in boldface, you have to call HTMLToSafeHTML (which you can give a shorter name) on the user input before enclosing it in <strong>...</strong> to make it boldface. While it is harmless to display "<strong>", it is anything but harmless to display user input that might be the result of malicious users trying quite deliberately to break into your website in order to spread a virus or for some other evil purpose.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top