سؤال

Before explaining, I am using VB.net and HtmlAgilityPack.

I have the below html, all three sections have the same format. I am using htmlagilitypack to extract the data from the Title and Date. My code extracts the title correctly but the date is only extracted from the first instance and repeated 3 times:

HtmlAgilityPack code:

For Each h4 As HtmlNode In docnews.DocumentNode.SelectNodes("//h4[(@class='title')]")
    Dim date1 As HtmlNode = docnews.DocumentNode.SelectSingleNode("//span[starts-with(@class, 'date ')]")
    Dim newsdate As String = date1.InnerText
    MessageBox.Show(h4.InnerText)
    MessageBox.Show(newsdate)
Next

I thought being in each h4, I get its associated date accordingly...

HTML code:

<div  class="article-header" style="" data-itemid="920729" data-source="ABC" data-preview="Text 1">
<h4 class="title"><a href="URL" class="title" title="Text 1">Text for Mr. A</a></h4>
    <div class="byline">
        <span class="date timestamp"><span  title="29 November 2013">29-11-2013</span></span>
        <span class="source" title="AGE">18</span>
    </div>
    <div class="preview">Text 1 Preview</div>
</div>

<div  class="article-header" style="" data-itemid="920720" data-source="ABC" data-preview="Text 2">
<h4 class="title"><a href="URL" class="title" title="Text 2">Text for Mr. B</a></h4>
    <div class="byline">
        <span class="date timestamp"><span  title="27 November 2013">27-11-2013</span></span>
        <span class="source" title="AGE">25</span>
    </div>
    <div class="preview">Text 2 Preview</div>
</div>

<div  class="article-header" style="" data-itemid="920719" data-source="ABC" data-pre+view="Text 3">
<h4 class="title"><a href="URL" class="title" title="Text 3">Text for Mr. C</a></h4>
    <div class="byline">
        <span class="date timestamp"><span  title="22 October 2013">22-10-2013</span></span>
        <span class="source" title="AGE">20</span>
    </div>
    <div class="preview">Text 3 Preview</div>
</div>

Final Output should be:

Text for Mr. A

29-11-2013

Text for Mr. B

27-11-2013

Text for Mr. C

22-10-2013

What I am getting with my code:

Text for Mr. A

29-11-2013

Text for Mr. B

29-11-2013

Text for Mr. C

29-11-2013

Any help is much appreciated.

هل كانت مفيدة؟

المحلول

You need to anchor your second XPath to look 'below' the h4:

Dim date1 As HtmlNode = h4.Parent.SelectSingleNode(".//span[starts-with(@class, 'date ')]")
                        ^^^^^^^^^                   ^^^

The .// tells Xpath to look under the node the Xpath is executed on. Thus by calling SelectSingleNode on the h4.Parent you get the date below the parent div tag of the h4.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top