문제

I've got a rather large XML file that I'm trying to parse using a C# application and the HtmlAgilityPack. The XML looks something like this:

...
<tr>
<td><b>ABC-123</b></td>
<td>15</td>
<td>4</td>
</tr>
<tr>
<td>AB-4-320</td>
<td>11</td>
<td>2</td>
</tr>
<tr>
<td><b>ABC-123</b></td>
<td>15</td>
<td>4</td>
</tr>
<tr>
<td>AB-4-320</td>
<td>11</td>
<td>2</td>
</tr>
<tr>
<td>CONTROLLER1</td>
<td>4</td>
<td>3</td>
</tr>
<td>CONTROLLER2</td>
<td>4</td>
<td>3</td>
</tr>
...

Basically a series of table rows and columns that repeats. I'm first doing a search for a controller by using:

string xPath = @"//tr/td[starts-with(.,'CONTROLLER2')]";
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes(xPath);
foreach (HtmlNode link in nodes) { ... }

Which returns the correct node. Now I want to search backwards (up) for the first (nearest) matching <td> node that starts with text "ABC":

string xPath = @link.XPath + @"/parent::tr/preceding-sibling::tr/td[starts-with(.,'ABC-')]";

This returns all matching nodes, not just the nearest one. When I attempted to add [1] to the end of this XPath string, it didn't seem to work and I've found no examples showing a predicate being used with an axes function like this. Or, more likely, I'm doing it wrong. Any suggestions?

도움이 되었습니까?

해결책

You can use this XPath :

/parent::tr/preceding-sibling::tr[td[starts-with(.,'ABC-')]][1]/td[starts-with(.,'ABC-')]

That will search for nearest preceding <tr> that has child <td> starts with 'ABC-'. Then get that particular <td> element.

There are at least two approaches you can pick when using HtmlAgilityPack :

foreach (HtmlNode link in nodes)
{
    //approach 1 : notice dot(.) at the beginning of the XPath
    string xPath1 = 
        @"./parent::tr/preceding-sibling::tr[td[starts-with(.,'ABC-')]][1]/td[starts-with(.,'ABC-')]";
    var n1 = node.SelectSingleNode(xPath1);
    Console.WriteLine(n1.InnerHtml);

    //approach 2 : appending to XPath of current link
    string xPath2 = 
        @"/parent::tr/preceding-sibling::tr[td[starts-with(.,'ABC-')]][1]/td[starts-with(.,'ABC-')]";
    var n2 = node.SelectSingleNode(link.XPath + xPath2);
    Console.WriteLine(n2.InnerHtml);
}

다른 팁

If you're able to use LINQ-to-XML instead of the HAP then this works:

var node = xml.Root.Elements("tr")
    .TakeWhile(tr => !tr.Elements("td")
        .Any(td => td.Value.StartsWith("CONTROLLER2")))
    .SelectMany(tr => tr.Elements("td"))
    .Where(td => td.Value.StartsWith("ABC-"))
    .Last();

I got this result:

<td>
  <b>ABC-123</b>
</td>

(Which I checked was the second matching node in your sample, not the first.)

you can use

//tr/td[starts-with(.,'CONTROLLER2')]/(parent::tr/preceding-sibling::tr/td[starts-with(normalize-space(.),'ABC-')])[1]

since the target node contains unwanted spaces, the use of normalize-space is a must.

I think an XPATH like this (from the current CONTROLLER2 node) should do it:

string xPath = "../preceding-sibling::tr[starts-with(td , 'ABC-')][1]/td[starts-with(. , 'ABC-')]";

It means

  • get back once ancestor level up (..)
  • from there, select all preceding sibling TR elements that have TD elements that start with 'ABC-'
  • get the first (reverse order) of these TR.
  • from this TR element, get TD elements that starts with 'ABC-'
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top