First thing I see is how you're retrieving the group of <li>
nodes. Just looking at your @class
attribute, you can't really tell how many spaces are in "featured block twoblock boxshadow
", but that XPath will only return a result if it is exactly equal to it.
In that regard, try using something more flexible like contains()
, i.e. //li[contains(@class, 'featured block')]
.
Without seeing what source you're targeting I can't suggest much more, but will update the answer when it's added to the question.
I've tried your XPath (just the /div part, since that's what was provided) on the given snippet and got back <span class="tel" itemprop="telephone"/>
as a result. Looks like an issue with how you're retrieving the <li>
company nodes.
Update 2:
From your updated XML snippet, your first XPath //li[@class='featured block twoblock boxshadow']"
doesn't look like it will match the parent <li>
node, based on what I mentioned with the spaces before. Secondly if it did, you are checking the <li>
node's attributes twice on separate queries, and assuming that the index you're giving the data-pvd-p
value (starts at 3 in the snippet) will always match the list index (starts at 0, with your +1 added). I'd suggest removing this portion //li[@data-pvd-p='"+j+1+"']
and beginning with the //div
.
So something like this:
List<DomNode> companies = (List<DomNode>) page.getByXPath("//li[contains(@class, 'featured block']");
for (DomNode node : companies) {
// retrieve telephone number
DomNode telephone = (DomNode) node.getByXPath(
"div[@class='listingWrapper']/div[@class='itemInfo']/span[@class='tel']").get(0);