Question

I want to use a white list of tags, attributes and values to sanitize a html string, before I place it in the dom. Can safely I construct a dom element, and traverse over that to implement the white list filter, assuming that no malicious javascript could execute until I append the dom element to the document? Are there pitfalls to this approach?

Was it helpful?

Solution 2

No script embedded in the HTML can execute until it is put in the document. Try running this code on any page:

var html = "<script>document.body.innerHTML = '';</script>";
var div = document.createElement('div');
div.innerHTML = html;

You will notice nothing change. If the "malicious" script in the HTML was run, then the document should have vanished. So, you can use the DOM to sanitize HTML without worrying about bad JS being in the HTML. As long as you snip out the script in your sanitizer of course.


By the way, your approach is pretty safe and smarter than what most people try (parse it with regex, the poor fools). However, it's best to rely on good, trusted HTML sanitizing libraries for this, like HTML Purifier. Or, if you want to do it client-side, you can use ESAPI-JS (recommended by @Brett Zamir)

OTHER TIPS

It doesn't appear that anything will execute until you insert into the document, as per @rvighne's answer, but there are at least these (unusual) exceptions (tested in FF 27.0):

var userInput = '<a href="http://example.com" onclick="alert(\'boo!\')">Link<\/a>';
var el = document.createElement('div');
el.innerHTML = userInput;
el.addEventListener("click", function(e) {
    if (e.target.nodeName.toLowerCase() === 'a') {
        alert("I will also cause side effects; I shouldn't run on the wrong link!");
    }
});
el.getElementsByTagName('a')[0].click(); // Alerts "boo!" and "I will also cause side effects; I shouldn't run on the wrong link!"

...or...

var userInput = '<a href="http://example.com" onclick="alert(\'boo!\')">Link<\/a>';
var el = document.createElement('div');
el.innerHTML = userInput;
el.addEventListener("cat", function(e) { this.getElementsByTagName('a')[0].click(); });
var event = new CustomEvent("cat", {"detail":{}});
el.dispatchEvent(event); // Alerts "boo!"

...or... (though setUserData is deprecated, it is still working):

var userInput = '<a href="http://example.com" onclick="alert(\'boo!\')">Link<\/a>';
var span = document.createElement('span');
span.innerHTML = userInput;
span.setUserData('key', 10, {handle: function (n1, n2, n3, src) {
    src.getElementsByTagName('a')[0].click();
}});
var div = document.createElement('div');
div.appendChild(span);
span.cloneNode(); // Alerts "Boo!"    
var imprt = document.importNode(span, true); // Alerts "Boo!"
var adopt = document.adoptNode(span, true); // Alerts "Boo!"

...or during iteration...

var userInput = '<a href="http://example.com" onclick="alert(\'Boo!\');">Link</a>';
var span = document.createElement('span');
span.innerHTML = userInput;
var treeWalker = document.createTreeWalker(
  span,
  NodeFilter.SHOW_ELEMENT,
  { acceptNode: function(node) { node.click(); } },
  false
);
var nodeList = [];
while(treeWalker.nextNode()) nodeList.push(treeWalker.currentNode); // Alerts 'Boo!'

But without these kind of (unusual) event interactions, the fact of building into the DOM alone would not, as far as I have been able to detect, cause any side effects (and of course the examples above are contrived and one wouldn't expect to encounter them very often if at all!).

You can use a "sandboxed" iframe that won't execute anything.

var iframe = document.createElement('iframe');
iframe['sandbox'] = 'allow-same-origin';

From w3schools:

The sandbox attribute enables an extra set of restrictions for the content in the iframe. When the sandbox attribute is present, and it will:

  • block form submission
  • block script execution
  • disable APIs
  • ...

P.S. That's, by the way, exactly how we do it in our Html Sanitizer https://github.com/jitbit/HtmlSanitizer - we use the browser to interpret HTML and convert it to DOM. Feel free to check the code (or actually use the component)

(disclaimer: I'm the contributor to that OSS project)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top