It is not recommended to take all raw text because you have to split it and it is suicide.
Try this (take each <td>
with its specific class and take InnerText
not InnerHTML
):
List<string> topicList = new List<string>;
List<string> authorList = new List<string>;
List<string> lastPostList = new List<string>;
foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//td[@class='topic starter']"))
{
string topic;
topic = node.InnerText;
topicList.Add(topic);
}
foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//td[@class='author']"))
{
string author;
author = node.InnerText;
authorList.Add(author);
}
foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//td[@class='lastpost']"))
{
string lastpost;
lastpost = node.InnerText;
lastPostList.Add(lastpost); // This will take also the author that posted last post (e.g. Antony 24/10/09).
}
If you need separated text : last posted author and date you can use .split()
property for string.