Question

I've encountered the need to remove comments of the form:

<!--  Foo

      Bar  -->

I'd like to use a regular expression that matches anything (including line breaks) between the beginning and end 'delimiters.'

What would a good regex be for this task?

Was it helpful?

Solution

The simple way :

Regex xmlCommentsRegex = new Regex("<!--.*?-->", RegexOptions.Singleline | RegexOptions.Compiled);

And a better way :

Regex xmlCommentsRegex = new Regex("<!--(?:[^-]|-(?!->))*-->", RegexOptions.Singleline | RegexOptions.Compiled);

OTHER TIPS

NONE. It cannot be described by the context free grammar, which the regular expression is based upon.

Let's say this thread is exported in XML. Your example (<!-- FOO Bar -->), if enclosed in CDATA, will be lost, while it's not exactly a comment.

The 'proper' way would be to use XSLT and copy everything but comments.

Parsing XML with regex is considered bad style. Use some XML parsing library.

Here is some complete sample code to read an XML file in, and return a string which is the file with no comments.

var text = File.ReadAllText("c:\file.xml");
{ 
  const string strRegex = @"<!--(?:[^-]|-(?!->))*-->";
  const RegexOptions myRegexOptions = RegexOptions.Multiline;
  Regex myRegex = new Regex(strRegex, myRegexOptions);
  string strTargetString = text;
  const string strReplace = @""; 

  string result = myRegex.Replace(strTargetString, strReplace);
  return result;
}

Unfortunately, RegexOptions.Multiline alone will not do the trick (which is slightly counterintuitive).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top