Pergunta

I have a comments form that is currently plaintext. This is nice and easy because I can just strip anything vaguely resembling HTML out of it using strip_tags and htmlspecialchars the rest.

However, There are plans to turn this into a HTML form with support for several tags <a><b><p><h3>...etc.

The problem I see is that even strip_tag'ing the input and keeping those tags only would still leave me open to XSS and jscript inside the tags.

I know that HTML Purifier can be used to fix this, but it looks really bulky, slow and I'm finding it hard to believe that there isn't a better way?

I had considered using a BBCode editor instead, but assuming that these insert code into the database as BB, how would I then convert it from BB back to HTML in order to display it?

Foi útil?

Solução

Please do consider another time if you have the time and effort to write you own validation scheme, as this is pretty complex.

For one, take a look at what happens if you do not use recursion.

Badword : CopyThis

CopyCopyThisThis

Which leaves you with

CopyThis

The bad word...

Outras dicas

BBCode is indeed a very easy and useful solution for your problem.

I am using MarkItUp! editor, but you can surely find many others.

how would I then convert it from BB back to HTML in order to display it?

The answer is simple: You can use preg_replace for it.

I implemented this a while back myself, so I can let you use my parser code. It translates the basic tags, as well as some custom ones like [center]. It's really easy to add your own tags or replace the current ones.

The script consists of a huge array with regexes and replacements, and one preg_replace call.

function replaceBBcode($str) {
    $replace = array(

        // inline text formats
        '/\[b\](.*?)\[\/b\]/is'                         => '<b>$1</b>',
        '/\[i\](.*?)\[\/i\]/is'                         => '<i>$1</i>',
        '/\[u\](.*?)\[\/u\]/is'                         => '<u>$1</u>',
        '/\[s\](.*?)\[\/s\]/is'                         => '<s>$1</s>',
        '/\[sup\](.*?)\[\/sup\]/is'                     => '<sup>$1</sup>',
        '/\[sub\](.*?)\[\/sub\]/is'                     => '<sub>$1</sub>',

        // headings
        '/\[h1\](.*?)\[\/h1\]/is'                       => '<h1>$1</h1>',
        '/\[h2\](.*?)\[\/h2\]/is'                       => '<h2>$1</h2>',
        '/\[h3\](.*?)\[\/h3\]/is'                       => '<h3>$1</h3>',
        '/\[h4\](.*?)\[\/h4\]/is'                       => '<h4>$1</h4>',
        '/\[h5\](.*?)\[\/h5\]/is'                       => '<h5>$1</h5>',

        // formatting tags
        '/\[(?:hr|line)\]/is'                           => '<hr />',
        '/\[br\/?\]/is'                                 => '<br />',

        // links
        '/\[url=([^\]]+)\](.*?)\[\/url\]/is'            => '<a href="$1">$2</a>',
        '/\[link=([^\]]+)\](.*?)\[\/link\]/is'          => '<a href="$1">$2</a>',
        '/\[url\](.*?)\[\/url\]/is'                     => '<a href="$1" title="$1">$1</a>',
        '/\[link\](.*?)\[\/link\]/is'                   => '<a href="$1" title="$1">$1</a>',

        '/\[img=([^\]]+)\]/is'                          => '<img src="$1" alt="" />',

        // text blocks and block formats
        '/\[font=([^\]]+)\](.*?)\[\/font\]/is'          => '<span style="font-family: $1;">$2</span>',
        '/\[size=([0-9]+)\](.*?)\[\/size\]/is'          => '<span style="font-size: $1pt;">$2</span>',
        '/\[color=([^\]]+)\](.*?)\[\/color\]/is'        => '<span style="color: $1;">$2</span>',
        '/\[bgcolor=([^\]]+)\](.*?)\[\/bgcolor\]/is'    => '<span style="background-color: $1;">$2</span>',
        '/\[p\](.*?)\[\/p\]/is'                         => '<p>$1</p>',

        // alignment blocks
        '/\[align=(left|center|right|justify)\](.*?)\[\/align\]/is'     => '<div style="text-align: $1;">$2</div>',
        '/\[center\](.*?)\[\/center\]/is'               => '<div style="text-align: center;">$1</div>',
        '/\[left\](.*?)\[\/left\]/is'                   => '<div style="text-align: left;">$1</div>',
        '/\[right\](.*?)\[\/right\]/is'                 => '<div style="text-align: right;">$1</div>',
        '/\[justify\](.*?)\[\/justify\]/is'             => '<div style="text-align: justify;">$1</div>',

        // lists
        '/\[list=(disc|circle|square)\](.*?)\[\/list\]/is'  => '<ul style="list-style-type:$1;">$2</ul>',
        '/\[list\](.*?)\[\/list\]/is'                   => '<ul>$1</ul>',
        '/\[list=a\](.*?)\[\/list\]/s'                  => '<ol style="list-style-type:lower-alpha;">$1</ol>',
        '/\[LIST=a\](.*?)\[\/LIST\]/s'                  => '<ol style="list-style-type:lower-alpha;">$1</ol>',
        '/\[list=A\](.*?)\[\/list\]/s'                  => '<ol style="list-style-type:upper-alpha;">$1</ol>',
        '/\[LIST=A\](.*?)\[\/LIST\]/s'                  => '<ol style="list-style-type:upper-alpha;">$1</ol>',
        '/\[list=1\](.*?)\[\/list\]/is'                 => '<ol style="list-style-type:decimal;">$1</ol>',
        '/\[list=I\](.*?)\[\/list\]/is'                 => '<ol style="list-style-type:upper-roman;">$1</ol>',
        '/\[\*\]/is'                                    => '<li>',

        // videos
        '/\[(?:youtube|video|media|movie){1}\](?:https?\:\/\/)?(?:www\.)?(?:youtube\.com\/watch\?v=|youtube\.com\/v\/|youtu\.be\/)?([a-z0-9\-\_]+)\[\/(?:youtube|video|media|movie){1}\]/is'
                    => '<iframe width="560" height="315" src="http://www.youtube.com/embed/$1?wmode=opaque" frameborder="0" allowfullscreen></iframe>',
    );

    // do the tags
    $str = preg_replace (array_keys($replace), array_values($replace), $str);  

    return $str; 
}

This will, however, not work good for nested tags. To replace a code with nested tags, I am using a loop like this:

$str = "... text to process ...";

// remove unwanted tags
$str = strip_tags($str);

// make entities of special chars (not quotes) 
$str = htmlentities($str, ENT_NOQUOTES, $encoding = 'UTF-8');

$str_old="";

do {
    $str_old=$str;
    $str=replaceBBcode($str);
} while ($str_old != $str);

// now $str contains the final html tags
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top