I'm trying to extract all the station names which are encased in the left frame from http://www.raws.dri.edu/wraws/orF.html using HTMLAgility pack.

My Xpath string is currently //frame[@name='list'] at this point it returns the node but I can't seem to access any of it's child nodes. Ultimately I'm trying to return all the attributes that are in frameset[1]/html/body/[@a] which looks something like this :

<a onmouseover="popup('<font color=Black><strong> IDARNG1 RG2  Idaho (RAWS) </strong>    </font> ',615,307);update('IDARNG1 RG2  Idaho (RAWS)',615,307,'idIAN1','raw');return true;"  onmouseout="removeBox();removedot();" href="/cgi-bin/rawMAIN.pl?idIAN1">`
有帮助吗?

解决方案

Here is what the browser is currently doing:

  • It opens http://www.raws.dri.edu/wraws/orF.html
  • It parses the source code, and perform another request for every <iframe> that appears on it.

That means you need to open manually the url the <iframe> is pointing to, which can be found in the src attribute. Below is an example:

string src = doc.DocumentNode.SelectSingleNode("//frame[@name='list']").GetAttribute("src", "");
string url = "http://www.raws.dri.edu/wraws/" + src;

The URL you're looking for is:

http://www.raws.dri.edu/wraws/orlst.html

Go and open it manually and you will see only the left sidebar is loaded.

Next time make sure you use a HTTP Web Debugger like Firebug or Fiddler, to see what is happening behind the scenes.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top