Stripping content from a website to my website

Question 1

Try doing something that is posted here: YQL JSON script not returning?

Basically it makes AJAX possible with help of YQL

Source: http://net.tutsplus.com/tutorials/javascript-ajax/quick-tip-cross-domain-ajax-request-with-yql-and-jquery/

Well, if you really want to keep the formatting and the style of the table, make your own table, and then put your own style onto it, and then extract info out of YQL and start populating the table. That way it be done with your method. YQL is really useful, I started playing with it a bit and find it very powerful.

Not sure if that would violate the copyright rules or not though, since you are indeed reusing the data in your own format.

Question 2

YQL Solution

First off, your XPath query is way too broad. Looking at the wiki page's source, I came up with this:

//div[@id='mw-content-text']/table//table[@class='center']

Unfortunately, the table that you want doesn't have an ID on it, so selecting tables with a center class was the best I could do. This returns 5 different tables; you want the first one. I tried to use the "first element" predicate (table[@class='center'][1]), but that didn't seem to do anything. Notice that the XML in the <results> element is straight XHTML that you could dump into your page. (That's assuming that you're requesting the results as XML, not JSON)

I found Yahoo's YQL Console really helpful. It allows you to fine tune your query before trying to incorporate it with Javascript to parse the results.

jQuery Solution

This isn't the optimal solution, but it circumvents the need to parse XML in Javascript or convert JSON to HTML. You can do an AJAX call to get the HTML and then strip out everything besides the table:

var scrapeUrl = 'www.example.com';
$.ajax({
  type: "GET",
  url: scrapeUrl,
  success(html) {
    var $scrapedElement = $(html).find("h1");
    $("#scrapedDataDiv").html($scrapedElement);
  },
  error() {
    alert("Problem getting table");
  }
});

In this example, the code downloads the page at www.example.com and scrapes out all of the h1 tags, thanks to jQuery's handy selectors. The h1 tags are then place in a div with the id of scrapedDataDiv.

Obviously, you still have to deal with XSS/Same Origin issues. You can do this by setting up a proxy on your server.