Question

Hi I have a html page that I want to query/"scrape" using YQL. I want to get only four columns text from the table tag on that html page and I don't know how to represent that using XPath.

I located one of the cells by right clicking the cell in Chrome, inspect element and copy xpath and This is the result I got for only that cell.

//*[@id="partsTable"]/tbody/tr[1]/td[8]/text()

So that is the expression for the 1st row and the 8th column. Actually, I want to get all the rows for the content in the 5,6,8,9 columns. I don't know if it would be possible to write that in XPath easily.

Thanks a lot for the help. (I am absolutely new to XPath so explanation would be appreciated)

Was it helpful?

Solution 2

You can query specific positions with a syntax similar to SQL's IN:

[position() = (5, 6, 8, 9)]

So your full expression would be:

//*[@id="partsTable"]/tbody/tr[1]/td[position() = (5, 6, 8, 9)]/text()

OTHER TIPS

Use position() to query the index of the element.

//*[@id="partsTable"]/tbody/tr/td[5 <= position() and position() <= 9]/text()

Watch out when fetching XPath expression using developer tools in browsers, have a look at "Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing?".

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top