Question

Hi I'm trying to figure out a way to remove the tags from the results returned from the Google Feed API. Specifically they are placing bold tags on titles and inside the description.

The codes that are being inserted are as follows:

\u003cb
\u003e
\u003c/b\u003e

Since its a fixed amount I did try doing a String.Replace() for each of these codes per string but it resulted in bad performance not surprisingly. I'm not sure if RegEx would be better (or worse). Does anyone have an idea on how to remove these? Google does not supply an option to remove tags from the results.

Was it helpful?

Solution

You could remove the unicode codes using a regex like this one:

\\u[\d\w]{4}

var subject = @"\u003cb\u003e\u003c/b\u003e";
var result = Regex.Replace(subject, @"\\u[\d\w]{4}", String.Empty);

As for performance, this article seems to suggest that regex is much slower, but i would run your own tests with your own data as it might be wildly different. The regular expression itself will play a big part in performance and I don't think that article states what the regex is being used so its impossible to compare. The size and type of your data will also play a big part, so it's difficult to say which is better without understanding your data.

Also, you should try compiling the regex with the RegexOptions.Compiled flag to see if that boosts performance.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top