Question

I want to remove all tags other than some grouped tags. I am having an text with more number of tags. I just want to remove all the tags other than some tags. Just an simple exaple below

<first>This <em>is</em> an <strong>testing</strong> data</first><second> testing <b>data</b> second</second><third>testing <i>data</i> third </third>

I just want to remove all the tags other than "em,i,b". Ouput text as below

This <em>is</em> an <strong>testing</strong> data testing <b>data</b> second<testing <i>data</i> third 

How to do this in regex.

I have tried like below

sampleStr = Regex.Replace(sampleStr , "<(?!(strong|em|u|i|b))>", "");

But its not working..

Was it helpful?

Solution

You are pretty close:

<(?!/?(?:strong|em|i|u|b))[^>]+>

Here is a "formal" explanation:

Remove Other Tags

<(?!/?(?:strong|em|i|u|b))[^>]+>

Match the character “<” literally «<»
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!/?(?:strong|em|i|u|b))»
   Match the character “/” literally «/?»
      Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
   Match the regular expression below «(?:strong|em|i|u|b)»
      Match either the regular expression below (attempting the next alternative only if this one fails) «strong»
         Match the characters “strong” literally «strong»
      Or match regular expression number 2 below (attempting the next alternative only if this one fails) «em»
         Match the characters “em” literally «em»
      Or match regular expression number 3 below (attempting the next alternative only if this one fails) «i»
         Match the character “i” literally «i»
      Or match regular expression number 4 below (attempting the next alternative only if this one fails) «u»
         Match the character “u” literally «u»
      Or match regular expression number 5 below (the entire group fails if this one fails to match) «b»
         Match the character “b” literally «b»
Match any character that is NOT a “>” «[^>]+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “>” literally «>»


Created with RegexBuddy
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top