In a web project, we use OWASP ESAPI in PHP for output encoding. At some points, we'd like to allow a subset of HTML for little formatting options (for example, <i> and <b>), while disallowing all other tags and special characters (so they are entity-encoded using the &...; syntax).

I see the following possibilities to achieve this:

  1. Tell the OWASP ESAPI encoder to white-list / allow these tags so that it only encodes all other HTML tags and entities properly. But this doesn't seem to be supported. We could however (maybe) write a patch which allows this.
  2. Decode the white-listed tags after encoding with ESAPI. Can this be attacked?
  3. Use some other output encoding technique for this use case. Are there other libraries?

In particular, I need the following tags and attributes to be white-listed:

  • <br>
  • <i>
  • <b>
  • <u>
  • <big>
  • <small>
  • <sub>
  • <sup>
  • <font color="...">
  • <ul> + <li>
  • <ol> + <li>

Please note that our application is security critical. This means that any method we are going to implement should only accept the tags above (and maybe some more formatting-only tags), everything else has to be entity-encoded properly. That this is true should be easily verifiable without doubt by looking at the (simple) code / explanation of the code. The shorter the code, the easier the reviews are. Fully hand-crafted encoders aren't good for this.

有帮助吗?

解决方案

It sounds like what you are actually looking for is HTMLPurifier

http://htmlpurifier.org/

FWIW I am not affiliated with HTMLPurifier at all, and I am the Project Leader of the OWASP ESAPI project.

其他提示

In several of the projects I've worked on in the past I used the Codeigniter Security Class as a guideline to implement my own security functions.

Check out the source on GitHub here: https://github.com/EllisLab/CodeIgniter/blob/develop/system/core/Security.php

There are several parts in this code related to encoding and xss. Might be useful for your situation.

Check out code below, tested on PHP 5.3, works fine. Code comments added.

<?php

// test input data
$data = 'br1<br>,br3<br >,      br4<br/>,br2<br />, <i>i1</i><b>b1</b> foo<b onmouseover="alert(hi);">b1</b><u tricky="hello">u1</u> <big>big1</big><small>small1</small> <sub>sub1</sub><div name="aaaa">div1</div> <sup>sup1</sup><font>font1</font> <font       >font2</font><font onmouseover="alert(\'hi\');"  color="red" style="background-color:green;">font3</font><font onmouseover="alert(\'hi\');"  color="red" style="background-color:green;" >font4</font><ul><li>li1</li></ul><ol><li>li2</li></ol>';

// set the allowed tags and their allowed attribs
// case-insensitive
$allowed = array(
"br" => "", // second value is allowed attrs, "" means all attrs allowed for this tag
"i" => "",
"b" => "",
"u" => "",
"big" => "",
"small" => "",
"sub" => "",
"sup" => "",
"font" => "color,style", // comma separated list of allowed attrs, other attrs will be stripped out
"ul" => "",
"ol" => "",
"li" => "",
);

// this will contain output results
$outdata='';
// this func will do the job
fixit();
// print out the results
echo $outdata;

function start_element_handler($parser, $name, $attrs) {
    global $outdata,$allowed;
    // tag allowed, check further
    if( in_array($name,array_keys($allowed)) ) {
        $attrout="";
        if(!empty($attrs)) {
            foreach($attrs as $attr=>$val) {
                // attr allowed, write normal output, else skip the attr/val
                if(empty($allowed[$name]) || in_array($attr,$allowed[$name]))
                    $attrout=$attrout." ".strtolower($attr)."=\"$val\" ";
            }
        }
        $outdata=$outdata."<".strtolower($name)."$attrout>";
    }
    // tag not allowed, htmlentityencode the output
    else {
        $attrout="";
        if(!empty($attrs)) {
            foreach($attrs as $attr=>$val) {
                $attrout=$attrout." ".strtolower($attr)."=\"$val\" ";
            }
        }
        $outdata=$outdata.htmlentities("<".strtolower($name)."$attrout>",ENT_COMPAT,'UTF-8');
    }
}

function end_element_handler($parser, $name) {
    global $outdata,$allowed;
    // void elements have no ending tags, so skip writing to output
    $voids = array("AREA", "BASE", "BR", "COL", "COMMAND", "EMBED", "HR", "IMG", "INPUT", "KEYGEN", "LINK", "META", "PARAM", "SOURCE", "TRACK", "WBR");
    if(in_array($name,$voids))
        return;
    $nameout = "</".strtolower($name).">";
    // tag allowed, write normal output
    if( in_array($name,array_keys($allowed)) ) {
        $outdata=$outdata.$nameout;
    }
    // tag not allowed, htmlentityencode the output 
    else {
        $outdata=$outdata.htmlentities($nameout,ENT_COMPAT,'UTF-8');
    }
}

function default_handler($parser, $data) {
    global $outdata,$allowed; $outdata=$outdata.htmlentities($data,ENT_COMPAT,'UTF-8');
}

function fixit() {
    global $data, $allowed, $outdata;
    $outdata="";
    // added temp body tags to make xml parser work fine  
    $data="<body>{$data}</body>";
    // change allowed tags and attrs to uppercase for comparisons later
    $allowed=array_change_key_case($allowed,CASE_UPPER);
    foreach($allowed as $tag=>$attrs) $allowed[$tag] = array_filter(explode(",",strtoupper($attrs)));
    // create the parser
    $parser = xml_parser_create('UTF-8');
    // set to uppercase comparisons
    xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, true);
    xml_set_element_handler($parser, 'start_element_handler', 'end_element_handler');
    xml_set_default_handler($parser, 'default_handler');
    // parse the data 
    xml_parse($parser, $data, true);
    xml_parser_free($parser);
    // set output in $outdata variable
    $outdata = str_ireplace(array('&lt;body&gt;','&lt;/body&gt;'), "", $outdata); 
}

?>
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top