Question

I have an entry form where the user can type arbitrary HTML. What do I need to filter out besides script tags? Here's what I do:

userInput.replace(/<(script)/gi, "&lt;$1");

but the sanitizer of WMD (used here on SO) manages a white list of tags, and filters out (blanks) all other tags. Why?

I don't like white lists because I don't want to prevent the user from entering arbitrary tags if she so chooses; but I can use a more extensive black list, besides 'script', if needed. What do I need as a black list?

Was it helpful?

Solution

Short answer: anything they can do with the script tag.

The script tag is not required to run javascript. Script can also be placed in almost every HTML tag. Script can appear in a number of places additional to the script tag including, but not limited to, src and href attributes that are used for URLs, event handlers and the style attribute.

The ability for a user to put unwanted script into your page is a security vulnerability known as cross-site scripting. Read around this topic and read the XSS prevention cheat sheet.

You may not want to let users add HTML to your pages. If you need this feature, consider other formats such as Markdown that allows you to disable the use of any embedded HTML; or another less secure option is to use a filtering library that tries to remove all script, such as HTMLPurifier. If you choose the filtering option, be sure to subscribe to announcements of new releases and always go back to your project to install the bug-fixed releases of the filter as new exploits are found and worked-around.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top