It sounds like what you are actually looking for is HTMLPurifier
FWIW I am not affiliated with HTMLPurifier at all, and I am the Project Leader of the OWASP ESAPI project.
题
In a web project, we use OWASP ESAPI in PHP for output encoding. At some points, we'd like to allow a subset of HTML for little formatting options (for example, <i>
and <b>
), while disallowing all other tags and special characters (so they are entity-encoded using the &...;
syntax).
I see the following possibilities to achieve this:
In particular, I need the following tags and attributes to be white-listed:
<br>
<i>
<b>
<u>
<big>
<small>
<sub>
<sup>
<font color="...">
<ul>
+ <li>
<ol>
+ <li>
Please note that our application is security critical. This means that any method we are going to implement should only accept the tags above (and maybe some more formatting-only tags), everything else has to be entity-encoded properly. That this is true should be easily verifiable without doubt by looking at the (simple) code / explanation of the code. The shorter the code, the easier the reviews are. Fully hand-crafted encoders aren't good for this.
解决方案
It sounds like what you are actually looking for is HTMLPurifier
FWIW I am not affiliated with HTMLPurifier at all, and I am the Project Leader of the OWASP ESAPI project.
其他提示
In several of the projects I've worked on in the past I used the Codeigniter Security Class as a guideline to implement my own security functions.
Check out the source on GitHub here: https://github.com/EllisLab/CodeIgniter/blob/develop/system/core/Security.php
There are several parts in this code related to encoding and xss. Might be useful for your situation.
Check out code below, tested on PHP 5.3, works fine. Code comments added.
<?php
// test input data
$data = 'br1<br>,br3<br >, br4<br/>,br2<br />, <i>i1</i><b>b1</b> foo<b onmouseover="alert(hi);">b1</b><u tricky="hello">u1</u> <big>big1</big><small>small1</small> <sub>sub1</sub><div name="aaaa">div1</div> <sup>sup1</sup><font>font1</font> <font >font2</font><font onmouseover="alert(\'hi\');" color="red" style="background-color:green;">font3</font><font onmouseover="alert(\'hi\');" color="red" style="background-color:green;" >font4</font><ul><li>li1</li></ul><ol><li>li2</li></ol>';
// set the allowed tags and their allowed attribs
// case-insensitive
$allowed = array(
"br" => "", // second value is allowed attrs, "" means all attrs allowed for this tag
"i" => "",
"b" => "",
"u" => "",
"big" => "",
"small" => "",
"sub" => "",
"sup" => "",
"font" => "color,style", // comma separated list of allowed attrs, other attrs will be stripped out
"ul" => "",
"ol" => "",
"li" => "",
);
// this will contain output results
$outdata='';
// this func will do the job
fixit();
// print out the results
echo $outdata;
function start_element_handler($parser, $name, $attrs) {
global $outdata,$allowed;
// tag allowed, check further
if( in_array($name,array_keys($allowed)) ) {
$attrout="";
if(!empty($attrs)) {
foreach($attrs as $attr=>$val) {
// attr allowed, write normal output, else skip the attr/val
if(empty($allowed[$name]) || in_array($attr,$allowed[$name]))
$attrout=$attrout." ".strtolower($attr)."=\"$val\" ";
}
}
$outdata=$outdata."<".strtolower($name)."$attrout>";
}
// tag not allowed, htmlentityencode the output
else {
$attrout="";
if(!empty($attrs)) {
foreach($attrs as $attr=>$val) {
$attrout=$attrout." ".strtolower($attr)."=\"$val\" ";
}
}
$outdata=$outdata.htmlentities("<".strtolower($name)."$attrout>",ENT_COMPAT,'UTF-8');
}
}
function end_element_handler($parser, $name) {
global $outdata,$allowed;
// void elements have no ending tags, so skip writing to output
$voids = array("AREA", "BASE", "BR", "COL", "COMMAND", "EMBED", "HR", "IMG", "INPUT", "KEYGEN", "LINK", "META", "PARAM", "SOURCE", "TRACK", "WBR");
if(in_array($name,$voids))
return;
$nameout = "</".strtolower($name).">";
// tag allowed, write normal output
if( in_array($name,array_keys($allowed)) ) {
$outdata=$outdata.$nameout;
}
// tag not allowed, htmlentityencode the output
else {
$outdata=$outdata.htmlentities($nameout,ENT_COMPAT,'UTF-8');
}
}
function default_handler($parser, $data) {
global $outdata,$allowed; $outdata=$outdata.htmlentities($data,ENT_COMPAT,'UTF-8');
}
function fixit() {
global $data, $allowed, $outdata;
$outdata="";
// added temp body tags to make xml parser work fine
$data="<body>{$data}</body>";
// change allowed tags and attrs to uppercase for comparisons later
$allowed=array_change_key_case($allowed,CASE_UPPER);
foreach($allowed as $tag=>$attrs) $allowed[$tag] = array_filter(explode(",",strtoupper($attrs)));
// create the parser
$parser = xml_parser_create('UTF-8');
// set to uppercase comparisons
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, true);
xml_set_element_handler($parser, 'start_element_handler', 'end_element_handler');
xml_set_default_handler($parser, 'default_handler');
// parse the data
xml_parse($parser, $data, true);
xml_parser_free($parser);
// set output in $outdata variable
$outdata = str_ireplace(array('<body>','</body>'), "", $outdata);
}
?>