I want to clean user input for help preventing XSS attacks and we don't necessarily care to have a HTML whitelist, as our users shouldn't need to post any HTML / CSS.

Eyeing the alternatives out there, which would be better? [Apache Commons Text's StringEscapeUtils] [1] or [JSoup Cleaner][2]?

Thanks!

Update:

I went with JSoup after writing some unit tests for both it and Apache Commons Text.

I like how JSoup won't mess with single quotation marks (i.e. "Alan's mom" isn't unchanged, whereas Apache Commons Text turns it into "Alan's mom").

And the whitelist wasn't a problem at all. It didn't require any configuration, rather, they have some built-in options included which may come in handy if we choose to allow some subsets of HTML tags. [1]: https://commons.apache.org/proper/commons-text/apidocs/org/apache/commons/text/StringEscapeUtils.html [2]: http://jsoup.org/cookbook/cleaning-html/whitelist-sanitizer

有帮助吗?

解决方案

"Better"? I don't think it matters. Cleaner has a Whitelist.none(), escape utils will escape everything.

It depends on how you want the "cleaned" input to render: do you want just the text nodes, or do you want the escaped HTML to show up?

其他提示

I would love to see Cuga's test cases because if you are using the Apache Commons escapeHtml in 2.6 or escapeHtml4 in 3+ it does not add slashes. It simply converts characters to HTML entities, which is clearly stated in the documentation.

I even have a public example to test this out:

https://gist.github.com/croucha/2e2925264890886cbf4d

So please, prove me wrong otherwise your part about the escaping adding slashes is wrong. If you want to still display these unsafe characters but avoid execution inside the browser, then your best option is Apache commons. As far as I can tell, Jsoup completely omits the characters including the contents even if it's safe.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top