How can I Strip all regular html tags except <a></a>, <img>(attributes inside) and <br> with javascript?

StackOverflow https://stackoverflow.com/questions/18128125

  •  24-06-2022
  •  | 
  •  

Question

When a user create a message there is a multibox and this multibox is connected to a design panel which lets users change fonts, color, size etc.. When the message is submited the message will be displayed with html tags if the user have changed color, size etc on the font.

Note: I need the design panel, I know its possible to remove it but this is not the case :)

It's a Sharepoint standard, The only solution I have is to use javascript to strip these tags when it displayed. The user should only be able to insert links, images and add linebreaks.

Which means that all html tags should be stripped except <a></a>, <img> and <br> tags.

Its also important that the attributes inside the the <img> tag that wont be removed. It could be isplayed like this:

<img src="/image/Penguins.jpg" alt="Penguins.jpg" style="margin:5px;width:331px;">

How can I accomplish this with javascript?

I used to use this following codebehind C# code which worked perfectly but it would strip all html tags except <br> tag only.

public string Strip(string text)
{
   return Regex.Replace(text, @"<(?!br[\x20/>])[^<>]+>", string.Empty);
}

Any kind of help is appreciated alot

Was it helpful?

Solution

Does this do what you want? http://jsfiddle.net/smerny/r7vhd/

$("body").find("*").not("a,img,br").each(function() {
    $(this).replaceWith(this.innerHTML);
});

Basically select everything except a, img, br and replace them with their content.

OTHER TIPS

Smerny's answer is working well except that the HTML structure is like:

var s = '<div><div><a href="link">Link</a><span> Span</span><li></li></div></div>';
var $s = $(s);
$s.find("*").not("a,img,br").each(function() {
    $(this).replaceWith(this.innerHTML);
});
console.log($s.html());

The live code is here: http://jsfiddle.net/btvuut55/1/

This happens when there are more than two wrapper outside (two divs in the example above).

Because jQuery reaches the most outside div first, and its innerHTML, which contains span has been retained.

This answer $('#container').find('*:not(br,a,img)').contents().unwrap() fails to deal with tags with empty content.

A working solution is simple: loop from the most inner element towards outside:

var $elements = $s.find("*").not("a,img,br");
for (var i = $elements.length - 1; i >= 0; i--) {
    var e = $elements[i];
    $(e).replaceWith(e.innerHTML);
}

The working copy is: http://jsfiddle.net/btvuut55/3/

with jQuery you can find all the elements you don't want - then use unwrap to strip the tags

$('#container').find('*:not(br,a,img)').contents().unwrap()

FIDDLE

I think it would be better to extract to good tags. It is easy to match a few tags than to remove the rest of the element and all html possibilities. Try something like this, I tested it and it works fine:

// the following regex matches the good tags with attrinutes an inner content
var ptt = new  RegExp("<(?:img|a|br){1}.*/?>(?:(?:.|\n)*</(?:img|a|br){1}>)?", "g");
var input = "<this string would contain the html input to clean>";              
var result = "";

var match = ptt.exec(input);                
while (match) {
    result += match;
    match = ptt.exec(input);
}

// result will contain the clean HTML with only the good tags
console.log(result);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top