YQL Solution
First off, your XPath query is way too broad. Looking at the wiki page's source, I came up with this:
//div[@id='mw-content-text']/table//table[@class='center']
Unfortunately, the table that you want doesn't have an ID on it, so selecting tables with a center
class was the best I could do. This returns 5 different tables; you want the first one. I tried to use the "first element" predicate (table[@class='center'][1]
), but that didn't seem to do anything. Notice that the XML in the <results>
element is straight XHTML that you could dump into your page. (That's assuming that you're requesting the results as XML, not JSON)
I found Yahoo's YQL Console really helpful. It allows you to fine tune your query before trying to incorporate it with Javascript to parse the results.
jQuery Solution
This isn't the optimal solution, but it circumvents the need to parse XML in Javascript or convert JSON to HTML. You can do an AJAX call to get the HTML and then strip out everything besides the table:
var scrapeUrl = 'www.example.com';
$.ajax({
type: "GET",
url: scrapeUrl,
success(html) {
var $scrapedElement = $(html).find("h1");
$("#scrapedDataDiv").html($scrapedElement);
},
error() {
alert("Problem getting table");
}
});
In this example, the code downloads the page at www.example.com
and scrapes out all of the h1
tags, thanks to jQuery's handy selectors. The h1
tags are then place in a div
with the id of scrapedDataDiv
.
Obviously, you still have to deal with XSS/Same Origin issues. You can do this by setting up a proxy on your server.