Question

I am trying to make a website that stream from a wiki page and take the content down into my page.

Before anyone saying it is illegal to scrape a website, mind you this is a wiki site, and under each page of that site, there is:

Content is available under Attribution-Noncommercial-Share Alike 3.0 Unported.

Meaning I am free to use and REUSE the info that is provided to me.

This is the wiki page: http://wiki.mabinogiworld.com/

Basically I am trying to make a website to take the server online status table directly and put it into my webpage, but at the same time I want to keep it updated, so it have to re-get the table next time the webpage is refreshed.

With this, I faced the cross domain issue and found something related to YQL that seems to be able to help me, but I still cant figure it out.

This is what I did so far:

YUI().use("yql", function (Y) 
{
    var query = 'SELECT * FROM html WHERE url="http://wiki.mabinogiworld.com/" and xpath="//div/table"';

    Y.YQL(query, function(results) 
    {
        var temp;
        var size = 0;
        temp = results.query.results.table;
        size = temp.length;

        for (var i = 0; i < size; i++) 
        {
            //Loop through the result and find the exact table I want
        }
    }
}

With the above code (the loop is too messy that I cut it out) I am able to get the exact table I want with all the sub columns and rows, but it is returned in a structure that I have no idea how to translate back into HTML.

What can I do to get the table from the wiki page and put it onto my webpage? And what is the variable type of "results" anyways? I cant seems to use it in any ways other than access.

Thank you.

Était-ce utile?

La solution

Try doing something that is posted here: YQL JSON script not returning?

Basically it makes AJAX possible with help of YQL

Source: http://net.tutsplus.com/tutorials/javascript-ajax/quick-tip-cross-domain-ajax-request-with-yql-and-jquery/


Well, if you really want to keep the formatting and the style of the table, make your own table, and then put your own style onto it, and then extract info out of YQL and start populating the table. That way it be done with your method. YQL is really useful, I started playing with it a bit and find it very powerful.

Not sure if that would violate the copyright rules or not though, since you are indeed reusing the data in your own format.

Autres conseils

YQL Solution

First off, your XPath query is way too broad. Looking at the wiki page's source, I came up with this:

//div[@id='mw-content-text']/table//table[@class='center']

Unfortunately, the table that you want doesn't have an ID on it, so selecting tables with a center class was the best I could do. This returns 5 different tables; you want the first one. I tried to use the "first element" predicate (table[@class='center'][1]), but that didn't seem to do anything. Notice that the XML in the <results> element is straight XHTML that you could dump into your page. (That's assuming that you're requesting the results as XML, not JSON)

I found Yahoo's YQL Console really helpful. It allows you to fine tune your query before trying to incorporate it with Javascript to parse the results.


jQuery Solution

This isn't the optimal solution, but it circumvents the need to parse XML in Javascript or convert JSON to HTML. You can do an AJAX call to get the HTML and then strip out everything besides the table:

var scrapeUrl = 'www.example.com';
$.ajax({
  type: "GET",
  url: scrapeUrl,
  success(html) {
    var $scrapedElement = $(html).find("h1");
    $("#scrapedDataDiv").html($scrapedElement);
  },
  error() {
    alert("Problem getting table");
  }
});

In this example, the code downloads the page at www.example.com and scrapes out all of the h1 tags, thanks to jQuery's handy selectors. The h1 tags are then place in a div with the id of scrapedDataDiv.

Obviously, you still have to deal with XSS/Same Origin issues. You can do this by setting up a proxy on your server.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top