Question

I am looking for some help in recognizing pattern from a string that is HTML Encoded.

If I have an HTML Encoded string like:

string strHTMLText=@"<p>Pellentesque habitant [[@Code1]] morbi tristique senectus [[@Code2]] et netus et malesuada fames ac [[@Code3]] turpis egestas.</p>"

I need to extract the words [[@Code1]], [@Code2], [[@Code3]], that is dynamic and their count is unknown. These words has been used to substitute other values in the provided HTML Text.

I want to recognize the pattern [[@something]] and populate all the occurrence in an array etc, so that I can process these values to fetch the relevant value from the database later.

Was it helpful?

Solution

string strHTMLText=@"<p>Pellentesque habitant [[@Code1]] morbi tristique senectus [[@Code2]] et netus et malesuada fames ac [[@Code3]] turpis egestas.</p>";
var input = HttpUtility.HtmlDecode(strHTMLText);
var list = Regex.Matches(input, @"\[\[@(.+?)\]\]")
    .Cast<Match>()
    .Select(m => m.Groups[1].Value)
    .ToList();

OTHER TIPS

Until someone comes along with the regex solution, for fun I did this for you:

string strHTMLText=@"&lt;p&gt;Pellentesque habitant [[@Code1]] morbi tristique senectus [[@Code2]] et netus et malesuada fames ac [[@Code3]] turpis egestas.&lt;/p&gt;";

IEnumerable<string> arr = strHTMLText.Split(new char[] {'['};
List<string> output = new List<string>();
foreach(var item in arr)
{
string placeHolder = item.Substring(0,item.IndexOf("]");
output.Add(placeHolder);
}

To get the output into an array:

output.ToArray();

You can use regular expressions.

Try using this expression

Regex exp = new Regex("\[.+?\]")
MatchCollection mc = exp.matches(<Your string here>);
foreach(Match m in mc)
{
   String code = m.value;
}

I have not tested this code though and it is a quick and dirty pseudo code so please bear with me.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top