Question

I'm working on Windows Phone 8 device and try to parse html document with html agility pack from http://www.livescience.com/41480-3d-printed-kidneys-take-small-steps.html I can get the <title> tag easily but now I want to get the whole <p> tags from that document. I've tried these two solution one and two but it not work. This is my actual code from based on those solutions

private void loadDoc()
    {
        try
        {
            HtmlWeb.LoadAsync("http://www.livescience.com/41480-3d-printed-kidneys-take-small-steps.html", Html_Completed);
        }
        catch (Exception ex)
        {
            MessageBox.Show(ex.ToString());
        }
    }

    private void Html_Completed(object sender, HtmlDocumentLoadCompleted e)
    {
        doc = e.Document;
        title = doc.DocumentNode.SelectSingleNode("//title");
        p = doc.DocumentNode.SelectNodes("//p");
        foreach(var node in p)
        {
          pr = node.InnerText; //that's the text you are looking for
        }
        text1.Text = title.InnerText;
        if (!pr.Equals("") && pr != "")
        {
            text2.Text = pr;
        }
        else
        {
            MessageBox.Show("null");
        }
    }

I use if block to determine whether the foreach returns null. Does anybody know how to solve this problem? I will appreciate any help. Thank you.

No correct solution

OTHER TIPS

Try doing:

p = doc.DocumentNode.SelectNodes(".//p");

instead of:

p = doc.DocumentNode.SelectNodes("//p");

since p is a type of tag, not a class or id.

There is nothing wrong with your code but it probably doesn't do what you expect. The foreach loop iterates all paragraphs (13 in total in the case of the provided url). Because the last paragraph is empty, pr will be empty after the last iteration.

If you want to fill text2 with all the paragraphs you should change the code like this:

pr += node.innerText;

If you want pr to contain readable text you need to decode the innerText because it can contain html entities like &gt;. You can do that like this:

pr += HtmlEntity.DeEntitize(node.innerText);

Hope this helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top