Question

The divs of the HTML page I am targeting look like this:

<div class="white-row1">
  <div class="results">
    <div class="profile">
      <a href="hrefThatIWant.com" class>
        <img src = "http://imgsource.jpg" border="0" width="150" height="150 alt>
      </a>
    </div>
   </div>
</div>
<div class="white-row2">
// same content as the div above
</div>

I want to scrap collect the href in each div in a list.

This is my current code:

List<HtmlAnchor> profileDivLinks = (List)htmlPage.getByXPath("//div[@class='profile']//@href"); 
for(HtmlAnchor link:profileDivLinks)
{
    System.out.println(link.getHrefAttribute());
}

This is the error I am receiving (which goes on first line of the for statement):

Exception in thread "main" java.lang.ClassCastException: com.gargoylesoftware.htmlunit.html.DomAttr cannot be cast to com.gargoylesoftware.htmlunit.html.HtmlAnchor 

What do you think the issue is?

Was it helpful?

Solution

The issue is you're getting an attribute and then you're casting that attribute to an anchor. I guess the solution with the minimal change to your code would be just modifying the XPath to return an anchor:

htmlPage.getByXPath("//div[@class='profile']//a"); 

OTHER TIPS

try

//div[@class='profile']//data(@href)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top