It turns out there is no tbody
tag in the html. This is added by the browser. So basically, the xpath recommended by Chrome is wrong.
library(httr)
grepl("table",content(GET(url),type="text"))
# [1] TRUE
grepl("tbody",content(GET(url),type="text"))
# [1] FALSE
Note:: This is in NO WAY a recommendation to use regular expressions to parse html!!!
The problem arises because browsers are designed to be relatively forgiving of improperly formatted html. So if a tag is unambiguously missing, the browser adds it (for example, if you send a page without a body tag, it will render anyway because the browser adds the tag to the DOM after loading the page). htmlParse(...)
doesn't work that way: it merely loads and parses the server response. The tbody tag was required for tables in the HTML 4 spec, so the browser adds it. See this post for an explanation.
So one way to deal with this, in a "semi-automatic" way is:
xpath <-paste("//*[@id='mw-content-text']/table[1]/tbody/tr[7]/td[2]/b/a",sep="")
if (length(html["//tbody"])==0) xpath <- gsub("/tbody","",xpath)
xpathSApply(html, xpath, xmlValue)
# [1] "James Madison"