Pregunta

I'm writing xpaths to select all the links under each category on left sidebar from following page: http://www.indexmundi.com/commodities/'>http://www.indexmundi.com/commodities/

I want to select the link under each category one by one. I've written the following xpath and it is selecting the link under first category(Commodity Price Indices) somehow. But I was wondering how I will select the links under other categories. I want to add a check on h3 tha if it's text is Energy, count and select all the rows before that, then if h3 text is Beverages, count and select all rows between Energy and Beverages

.//*[@id='dlCommodities']/tbody/tr[position()< count(following-sibling::tr/td/h3)-1]/td/a

Here is another xpath: .//*[@id='dlCommodities']/tbody/tr[preceding-sibling::tr/td/h3[. = 'Energy'] and following-sibling::tr/td/h3[. = 'Beverages']]/td/a

It is fulfilling the second requirement i.e. select rows between specific headings but it is missing one node.

Please help me fix these xpaths or suggest a better one.

Thanks

¿Fue útil?

Solución

I understand your actual problem as: Find all links that belong to a given category. For doing so, find the category, and then retrieve all elements before the next category.

You might remove the newlines if you prefer, I added them for readability.

//tr[td/h3="Energy"]/(self::tr, following-sibling::tr[
  . << //tr[td/h3="Energy"]/following-sibling::tr[td/h3][1]
])

If you do not have an XPath 2.0 compatible processor, you cannot use the << operator which test for node order (the current node must precede the next category). An XPath 1.0 solution is even slightly shorter, but in my opinion worse in readability:

//tr[td/h3="Energy"] | //tr[td/h3="Energy"]/following-sibling::tr[
  ./preceding-sibling::tr[td/h3][1][td/h3="Energy"] and not(td/h3)
]

Both queries will select all nodes of a category; to count them wrap them into count(...).

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top