Question

I am using GetSafeHtmlFragment in my website and I found that all of tags except <p> and <a> is removed.

I researched around and I found that there is no resolution for it from Microsoft.

Is there any superseded for it or is there any solution?

Thanks.

Was it helpful?

Solution 2

An alternative solution would be to use the Html Agility Pack in conjunction with your own tags white list :

using System;
using System.IO;
using System.Text;
using System.Linq;
using System.Collections.Generic;
using HtmlAgilityPack;

class Program
{
    static void Main(string[] args)
    {
        var whiteList = new[] 
            { 
                "#comment", "html", "head", 
                "title", "body", "img", "p",
                "a"
            };
        var html = File.ReadAllText("input.html");
        var doc = new HtmlDocument();
        doc.LoadHtml(html);
        var nodesToRemove = new List<HtmlAgilityPack.HtmlNode>();
        var e = doc
            .CreateNavigator()
            .SelectDescendants(System.Xml.XPath.XPathNodeType.All, false)
            .GetEnumerator();
        while (e.MoveNext())
        {
            var node =
                ((HtmlAgilityPack.HtmlNodeNavigator)e.Current)
                .CurrentNode;
            if (!whiteList.Contains(node.Name))
            {
                nodesToRemove.Add(node);
            }
        }
        nodesToRemove.ForEach(node => node.Remove());
        var sb = new StringBuilder();
        using (var w = new StringWriter(sb))
        {
            doc.Save(w);
        }
        Console.WriteLine(sb.ToString());
    }
}

OTHER TIPS

Amazing that Microsoft in the 4.2.1 version terribly overcompensated for a security leak in the 4.2 XSS library and now still hasn't updated a year later. The GetSafeHtmlFragment method should have been renamed to StripHtml as I read someone commenting somewhere.

I ended up using the HtmlSanitizer library suggested in this related SO issue. I liked that it was available as a package through NuGet.

This library basically implements a variation of the white-list approach the now accepted answer uses. However it is based on CsQuery instead of the HTML Agility library. The package also gives some additional options, like being able to keep style information (e.g. HTML attributes). Using this library resulted in code in my project something like below, which - at least - is a lot less code than the accepted answer :).

using Html;

...

var sanitizer = new HtmlSanitizer();
sanitizer.AllowedTags = new List<string> { "p", "ul", "li", "ol", "br" };
string sanitizedHtml  = sanitizer.Sanitize(htmlString);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top