Question

<tr id='tr1' align='center' border=0 class='headerclass'>
  Example text
  <tr id='tr11' align='center' border=0 bgColor='99ccff'> 
    <td id='td1' class='headerclass'>Example Header 1 </td>
    <td id='td2' class='headerclass'>Example Header 2 </td>
    <td id='td3' class='headerclass'>Example Header 3 </td>
  </tr>
  <tr id='tr12' align='center"'bgColor='white'>
    <td id='v1' class='colclass'>value 1</td>
    <td id='v2' class='colclass'>value 2</td>
    <td id='v3' class='colclass'>value 3</td>
  </tr>
</tr>

Above is the HTML example that I wanted to scrape. I want to get Example text which is in <tr></tr>. I tried to use InnerText (code as shown below) but it returns me all the text in <td></td> as well which is not what I want. I would like to get Example text only.

var nodes = htmlDoc.DocumentNode.SelectNodes("//tr").Where(x => x.Attributes["id"] != null && x.Attributes["id"].Value.Contains("tr1"));
foreach (var htmlNode in nodes)
{
   Console.WriteLine(htmlNode.InnerText);
}

Output:

Example text
Example Header 1
Example Header 2
Example Header 3 
value 1
value 2
value 3

Thank you.

Was it helpful?

Solution

You could do it something like this :

var text = doc.DocumentNode.Descendants("tr")
               .First(p => p.Attributes["id"] != null &&   
               p.Attributes["id"].Value.Contains("tr1")).ChildNodes[0].InnerText.Trim();

The output is :

Example text
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top