Question

I wrote this piece of code in order to get the html code from a URL inserted by the user. I used the HtmlAgilityPack because i want to work with only specific parts of the code(body,title etc). I succeeded "downloading" the data from the website but i geuss my Xpath code is somehow incorrect. Here is the relevent code:

Dim htmlWeb As String = URL (inserted by the user)
Dim htmlDoc As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument
    htmlDoc.LoadHtml(htmlWeb)
Dim htmlText As String
    htmlDoc.OptionFixNestedTags = True
Dim myBR As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("...")
    htmlText = myBR("...").InnerText

    For Each Match_Positive_Word As Match In Regex.Matches(htmlText, Positive_Words)
        Positive_Counter = Positive_Counter + 1
    Next

    For Each Match_Negative_Word As Match In Regex.Matches(htmlText, Negative_Words)
        Negative_Counter = Negative_Counter + 1
    Next

Questions:

  • What i need to write inside the brackets in order to get, for instance, the data inside the body tag?
  • Does my connection with the html code is correct? Is there any better or more efficient way to do this?

EDIT

When i do htmlDoc.Load(htmlWeb) it gives me the error: URI formats are not supported.But when i do LoadHtml it seems to work. The main problem is in the line htmlText = myBR.InnerText. it returns the error: Object reference not set to an instance of an object. Here is what i wrote:

Dim htmlWeb As String = URL
Dim htmlDoc As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument
    htmlDoc.LoadHtml(htmlWeb)
Dim htmlText As String
    htmlDoc.OptionFixNestedTags = True
Dim myBR As HtmlNode = htmlDoc.DocumentNode.SelectSingleNode("//body")
    htmlText = myBR().InnerText

Is there anything i need to add in the barackets of myBR? I tried .InnerHtml and it doesn't work either.

Était-ce utile?

La solution

Use Load() method if you have the URL, and use LoadHtml() if you have the html page. It appears that you need the first method in this case :

htmlDoc.Load(htmlWeb)

As you asked as an example, to get body tag you can use this simple XPath //body

UPDATE :

I missed the fact that HAP's HtmlDocument, unlike XDocument, can't Load directly from URL. The method only accept path to a file in local machine. To load HtmlDocument from URL you need to use HtmlWeb's Load() method instead. Try this way :

Dim htmlWeb As New HtmlWeb
Dim htmlDoc As HtmlAgilityPack.HtmlDocument = htmlWeb.Load(URL)
Dim htmlText As String
    htmlDoc.OptionFixNestedTags = True
Dim myBR As HtmlNode = htmlDoc.DocumentNode.SelectSingleNode("//body")
    htmlText = myBR().InnerText
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top