Question

if i have a user entering data into a rich text editor (tiny editor) and submitting data that i am storing into a database and then retrieving to show on other dynamic web pages, why do i need encoding here.

Is the only reason because someone might paste javascript into the rich text editor? is there any other reason ?

Was it helpful?

Solution

Security is the reason.

The most obvious/common reason is Cross-Site-Scripting (XSS). It turns out to be the root cause of the security problems you might witness in your site.

Cross-site scripting (XSS) is a type of computer security vulnerability typically found in web applications that enables malicious attackers to inject client-side script into web pages viewed by other users. An exploited cross-site scripting vulnerability can be used by attackers to bypass access controls such as the same origin policy. Cross-site scripting carried out on websites were roughly 80% of all security vulnerabilities documented by Symantec as of 2007.1 Their impact may range from a petty nuisance to a significant security risk, depending on the sensitivity of the data handled by the vulnerable site, and the nature of any security mitigations implemented by the site's owner.

Additional, as shown in below comments, the layout of your site can also be screwed up.

You need Microsoft Anti-Cross Site Scripting Library

More Resources

http://forums.asp.net/t/1223756.aspx

OTHER TIPS

You're making some mistakes.

If you're accepting HTML-formatted text from the rich-text editor, you cannot call Html.Encode, or it will encode all of the HTML tags, and you'll see raw markup instead of formatted text.

However, you still need to protect against XSS.

In other words, if the user enters the following HTML:

<b>Hello!</b>
<script>alert('XSS!');</script>

You want to keep the <b> tag, but drop (not encode) the <script> tag.
Similarly, you need to drop inline event attributes (like onmouseover) and Javascript URLs (like <a href="javascript:alert('XSS!');>Dancing Bunnies!</a>)

You should run the user's HTML through a strict XML parser and maintain a strict white-list of tags and attributes when saving the content.

I think you're confusing "encoding" with "scrubbing."

If you want to accept text from a user, you need to encode it as HTML before you render it as HTML. In this way, the text

a < b

is HTML-encoded as

a &lt; b

and rendered in an HTML browser (just as the user entered it) as:

a < b

If you want to accept HTML from a user (which it sounds like you do in this case), it's already in HTML format, so you don't want to call HTML.Encode again. However, you may want to scrub it to remove certain markup that you don't allow (like script blocks).

Security is the main reason.

Not only could a user enter javascript code or some other naughtiness, you need to use HTML encode in order to display certain characters on the page. You wouldn't want your page to break because your database contained: "Nice Page :->".

Also, if you are entering the code into a database, be sure to "sanatize" the inputs to the database.

Yes, it is to prevent JavaScript from executing if someone were to input malicious string into the rich text editor. However, plain text javascript it not your only concern, for example this is a XSS:

<IMG SRC=&#0000106&#0000097&#0000118&#0000097&#0000115&#0000099&#0000114&#0000105&#0000112&#0000116&#0000058&#0000097&#0000108&#0000101&#0000114&#0000116&#0000040&#0000039&#0000088&#0000083&#0000083&#0000039&#0000041>

Take a look here for a range of different XSS options; http://ha.ckers.org/xss.html

As an aside..... MVC2 has implemented new functionality so you no longer need to call HTML.Encode

if you change your view syntax from

to

MVC will automatically encode for you. It makes thing much easier/quicker. Again, MVC2 only

Another reason is that some user can input a few closing tags </div></table> and potentially break the layout of your web site. If you are using an HTML editing tool make sure the produced html is valid before embedding it in the page without encoding. Some server side parsing is required in order to do this. You can use HtmlAgilityPack to do this.

The primary reason to do what your suggesting is to escape your output. Since you are accepting HTML and want to output it you can't do that. What you need to do is filter out thing that user's can do that are insecure, or at least not what you want.

For that, let me suggest AntiSamy.

You can demo it here.

What you are doing has a lot of inherit risks and you should consider it very carefully.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top