Question

I'm trying to find out if there's a wikipedia api (I Think it is related to the mediawiki?).

If so, I would like to know how I would tell wikipedia to give me an article about the new york yankees for example.

What would the REST url be for this example?

All the docs on this subject seem fairly complicated.

Was it helpful?

Solution

You really really need to spend some time reading the documentation, as this took me a moment to look and click on the link to fix it. :/ but out of sympathy i'll provide you a link that maybe you can learn to use.

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=New_York_Yankees&rvprop=timestamp|user|comment|content

That's the variabled you will be looking to get. Your best bet is to know the page you will be after and replace the Wikipedia link part into the title i.e.:

http://en.wikipedia.org/wiki/New_York_Yankees [Take the part after the wiki/]

-->

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=New_York_Yankees&rvprop=timestamp|user|comment|content

[Place it in the title variable of the GET request.

The URL above can do with tweaking to get the different sections you do or do not want. So read the documentation :)

OTHER TIPS

The answers here helped me arrive at a solution, but I discovered more info in the process which may be of advantage to others who find this question. I figure most people simply want to use the API to quickly get content off the page. Here is how I'm doing that:

Using Revisions:

//working url:
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Threadless&rvprop=content&format=json&rvsection=0&rvparse=1

//Explanation
//Base Url:
http://en.wikipedia.org/w/api.php?action=query

//tell it to get revisions:
&prop=revisions

//define page titles separated by pipes. In the example i used t-shirt company threadless
&titles=whatever|the|title|is

//specify that we want the page content
&rvprop=content

//I want my data in JSON, default is XML
&format=json

//lets you choose which section you want. 0 is the first one.
&rvsection=0

//tell wikipedia to parse it into html for you
&rvparse=1

Using Extracts (better/easier for what i'm doing)

//working url:
http://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=Threadless&format=json&exintro=1

//only explaining new parameters
//instead of revisions, we'll set prop=extracts
&prop=extracts

//if we just want the intro, we can use exintro. Otherwise it shows all sections
&exintro=1

All the info requires reading through the API documentation as was mentioned, but I hope these examples will help the majority of the people who come here for a quick fix.

See http://www.mediawiki.org/wiki/API

Specifically, for the English Wikipedia, API is located at http://en.wikipedia.org/w/api.php

Have a look at the ApiSandbox at https://en.wikipedia.org/wiki/Special:ApiSandbox That is a web frontend to easily query the API. A few clicks will craft you the URL and show you the API result.

That is an extension for MediaWiki, enabled on all Wikipedia languages. https://www.mediawiki.org/wiki/Extension:ApiSandbox

If you want to extract structured data from Wikipedia, you may consider using DbPedia http://dbpedia.org/

It provides means to query data using given criteria using SPARQL and returns data from parsed Wikipedia infobox templates

There are some SPARQL libraries available for multiple platforms to make queries easier

If you want to extract structured data from Wikipedia, you may also try http://www.wikidata.org/wiki/Wikidata:Main_Page

Below is a working example that prints the first sentence from Wikipedias New York Yankees page to your web browsers console:

<!DOCTYPE html>
</html>
    <head>
        <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>
    </head>
    <body>
        <script>
            var wikiUrl = "http://en.wikipedia.org/w/api.php?action=opensearch&search=New_York_Yankees&format=json&callback=wikiCallbackFunction";

            $.ajax(wikiUrl, {
                dataType: "jsonp",
                success: function( wikiResponse ) {
                    console.log( wikiResponse[2][0] );
                }
            });
        </script>   
    </body>
</html>

http://en.wikipedia.org/w/api.php is the endpoint for your url. You can see how to structure your url by visiting: http://www.mediawiki.org/wiki/API:Main_page

I used jsonp as the dataType to allow cross-site requests. More can be found here: http://www.mediawiki.org/wiki/API:Cross-site_requests

Last but not least, make sure to reference the Jquery.ajax() API: http://api.jquery.com/jquery.ajax/

Wiki Parser converts Wikipedia dumps into XML. It is also quite fast. You can then use any XML processing tool to handle the data from the parsed Wikipedia articles.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top