PHP connecting to MediaWiki API and retrieve data

https://stackoverflow.com/questions/1897511

19-09-2019
|

Question

I noticed there was a question somewhat similar to mine, only with c#:link text. Let me explain: I'm very new to the whole web-services implementation and so I'm experiencing some difficulty understanding (especially due to the vague MediaWiki API manual).

I want to retrieve the entire page as a string in PHP (XML file) and then process it in PHP (I'm pretty sure there are other more sophisticated ways to parse XML files but whatever): Main Page wikipedia.

I tried doing $fp = fopen($url,'r');. It outputs: HTTP request failed! HTTP/1.0 400 Bad Request. The API does not require a key to connect to it.

Can you describe in detail how to connect to the API and get the page as a string?

EDIT: The URL is $url='http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&redirects&titles=Main Page';. I simply want to read the entire content of the file into a string to use it.

Solution

Connecting to that API is as simple as retrieving the file,

fopen

$url = 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&redirects&titles=Main%20Page';
$fp = fopen($url, 'r');
while (!feof($fp)) {
    $c .= fread($fp, 8192);
}
echo $c;

file_get_contents

$url = 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&redirects&titles=Main%20Page';
$c = file_get_contents($url);
echo $c;

The above two can only be used if your server has the fopen wrappers enabled.

Otherwise if your server has cURL installed you can use that,

$url = 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&redirects&titles=Main%20Page';
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$c = curl_exec($ch);
echo $c;

OTHER TIPS

You probably need to urlencode the parameters that you are passing in the query string ; here, at least the "Main Page" requires encoding -- without this encoding, I get a 400 error too.

If you try this, it should work better (note the space is replaced by %20) :

$url='http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&redirects&titles=Main%20Page';
$str = file_get_contents($url);
var_dump($str);

With this, I'm getting the content of the page.

A solution is to use urlencode, so you don't have to encode yourself :

$url='http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&redirects&titles=' . urlencode('Main Page');
$str = file_get_contents($url);
var_dump($str);

According to the MediaWiki API docs, if you don't specify a User-Agent in your PHP request, WikiMedia will refuse the connection with a 4xx HTTP response code:

https://www.mediawiki.org/wiki/API:Main_page#Identifying_your_client

You might try updating your code to add that request header, or change the default setting in php.ini if you have edit access to that.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow