Recognize pattern to extract words from C# HTML Encoded String
-
01-07-2021 - |
Question
I am looking for some help in recognizing pattern from a string that is HTML Encoded.
If I have an HTML Encoded string like:
string strHTMLText=@"<p>Pellentesque habitant [[@Code1]] morbi tristique senectus [[@Code2]] et netus et malesuada fames ac [[@Code3]] turpis egestas.</p>"
I need to extract the words [[@Code1]], [@Code2], [[@Code3]], that is dynamic and their count is unknown. These words has been used to substitute other values in the provided HTML Text.
I want to recognize the pattern [[@something]] and populate all the occurrence in an array etc, so that I can process these values to fetch the relevant value from the database later.
Solution
string strHTMLText=@"<p>Pellentesque habitant [[@Code1]] morbi tristique senectus [[@Code2]] et netus et malesuada fames ac [[@Code3]] turpis egestas.</p>";
var input = HttpUtility.HtmlDecode(strHTMLText);
var list = Regex.Matches(input, @"\[\[@(.+?)\]\]")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
OTHER TIPS
Until someone comes along with the regex solution, for fun I did this for you:
string strHTMLText=@"<p>Pellentesque habitant [[@Code1]] morbi tristique senectus [[@Code2]] et netus et malesuada fames ac [[@Code3]] turpis egestas.</p>";
IEnumerable<string> arr = strHTMLText.Split(new char[] {'['};
List<string> output = new List<string>();
foreach(var item in arr)
{
string placeHolder = item.Substring(0,item.IndexOf("]");
output.Add(placeHolder);
}
To get the output into an array:
output.ToArray();
You can use regular expressions.
Try using this expression
Regex exp = new Regex("\[.+?\]")
MatchCollection mc = exp.matches(<Your string here>);
foreach(Match m in mc)
{
String code = m.value;
}
I have not tested this code though and it is a quick and dirty pseudo code so please bear with me.