Вопрос

I'm trying to parse a website in C# using Fizzler. My goal is to get this element: /html/body/form/div[3]/div/div/div/div/div/table/tbody/tr[18]/td[2]/span (FireBug XPath).

The problem is that the TR and TD numbers are not fixed. All I know that I always need the LAST span, in the LAST TD, in the LAST TR :)

I was trying with this, but all I get is NULL:

HtmlWeb document = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = document.Load("http://websiteaddress.com/site-name.html");

HtmlNodeCollection tableDatas = doc.DocumentNode.SelectNodes("//table/tbody/tr/td/span").Last();

This is the TABLE I'm trying to parse. I only need the content of the last span in the last td of the last row.

<table id="ctl00_WebPartManager1_blablabla_ctl00_tblRates" cellspacing="5" cellpadding="5" rules="all" border="1" style="width:100%;">
                <tr>
                    <th></th><th><span>USD</span></th>
                </tr><tr>
                    <th></th><th><span>USA $</span></th>
                </tr><tr>
                    <th></th><th><span>1</span></th>
                </tr><tr>
                    <td><span>2014. 03. 03.</span></td><td><span>227,31 </span></td>
                </tr><tr>
                    <td><span>2014. 03. 04.</span></td><td><span>226,79 </span></td>
                </tr><tr>
                    <td><span>2014. 03. 05.</span></td><td><span>225,66 </span></td>
                </tr><tr>
                    <td><span>2014. 03. 06.</span></td><td><span>225,03 </span></td>
                </tr><tr>
                    <td><span>2014. 03. 07.</span></td><td><span>223,14 </span></td>
                </tr><tr>
                    <td><span>2014. 03. 10.</span></td><td><span>224,63 </span></td>
                </tr><tr>
                    <td><span>2014. 03. 11.</span></td><td><span>226,06 </span></td>
                </tr><tr>
                    <td><span>2014. 03. 12.</span></td><td><span>226,53 </span></td>
                </tr><tr>
                    <td><span>2014. 03. 13.</span></td><td><span>223,63 </span></td>
                </tr><tr>
                    <td><span>2014. 03. 14.</span></td><td><span>225,74 </span></td>
                </tr><tr>
                    <td><span>2014. 03. 17.</span></td><td><span>224,67 </span></td>
                </tr><tr>
                    <td><span>2014. 03. 18.</span></td><td><span>224,65 </span></td>
                </tr><tr>
                    <td><span>2014. 03. 19.</span></td><td><span>223,26 </span></td>
                </tr><tr>
                    <td><span>2014. 03. 20.</span></td><td><span>225,94 </span></td>
                </tr><tr>
                    <td><span>2014. 03. 21.</span></td><td><span>226,25 </span></td>
                </tr>
            </table>

This is the result I get from the document.load() method (it's kinda messed up...):

    <table id="ctl00_WebPartManager1_blablabla_ctl00_tblRates" cellspacing="5" cellpadding="5" rules="all" border="1" style="width:100%;">
                <tr>
                    <th><th><span>USD</span>
                <tr>
                    <th><th><span>USA $</span>
                <tr>
                    <th><th><span>1</span>
                <tr>
                    <td><span>2014. 03. 03.</span><td><span>227,31 </span>
                <tr>
                    <td><span>2014. 03. 04.</span><td><span>226,79 </span>
                <tr>
                    <td><span>2014. 03. 05.</span><td><span>225,66 </span>
                <tr>
                    <td><span>2014. 03. 06.</span><td><span>225,03 </span>
                <tr>
                    <td><span>2014. 03. 07.</span><td><span>223,14 </span>
                <tr>
                    <td><span>2014. 03. 10.</span><td><span>224,63 </span>
                <tr>
                    <td><span>2014. 03. 11.</span><td><span>226,06 </span>
                <tr>
                    <td><span>2014. 03. 12.</span><td><span>226,53 </span>
                <tr>
                    <td><span>2014. 03. 13.</span><td><span>223,63 </span>
                <tr>
                    <td><span>2014. 03. 14.</span><td><span>225,74 </span>
                <tr>
                    <td><span>2014. 03. 17.</span><td><span>224,67 </span>
                <tr>
                    <td><span>2014. 03. 18.</span><td><span>224,65 </span>
                <tr>
                    <td><span>2014. 03. 19.</span><td><span>223,26 </span>
                <tr>
                    <td><span>2014. 03. 20.</span><td><span>225,94 </span>
                <tr>
                    <td><span>2014. 03. 21.</span><td><span>226,25 </span>

            </td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></th></th></tr></th></th></tr></th></th></tr></table>

Can someone help me out, please?

Thank you very much!

Это было полезно?

Решение

In addition to the <tbody> element being introduced by the browser, but not by HTML Agility Pack (this is why you don't receive any results at all); use [last() predicates to access the last child in the current element.

//table/tr[last()]/td[last()]/span[last()]

You can also query the last span of all, but this will probably a little bit slower as it has to construct the whole result set before:

(//table/tr/td/span)[last()]

Using .Last(); in C# would be even slightly worse, as the result set even has to be constructed as a C# array before omitting all but the last value.

Другие советы

You can use last() instead of exact element position to get the last element in result set :

//table/tr[last()]/td[last()]/span[last()]

Above XPath will get the last <tr>, then look for the last <td> in that <tr>, then look for the last <span> in that <td>.

If this isn't what you're looking for, I'd suggest to post sample html to make it easier for us to better understand the problem.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top