Question

I get many Wikipedia pageIDs in DB, and some of them are already redirected to somewhere else.

So I want to know how to get these new pageIDs.

I check the Wikipedia web:

http://en.wikipedia.org/wiki/?curid=11601783

It says (Redirected from....) which means it is not the main link I want. The good link should be:

http://en.wikipedia.org/wiki/?curid=34344124

So I want to know how to get the final pageID by API search like:

http://en.wikipedia.org/w/api.php?action=query&format=json&prop=extracts&pageids=11601783

What parameters should I use?

Was it helpful?

Solution

To make the API resolve redirects, just add redirects to a query. So, for example:

http://en.wikipedia.org/w/api.php?action=query&format=json&pageids=11601783&redirects

will give you the page id of the redirect target.

There doesn't seem to be a good way to do this using a single query for multiple pages, because the redirects part of the response maps from title to title, not page id (I'm assuming you don't know the title of the redirect page).

One way to work around that would be to combine redirects with prop=redirects:

http://en.wikipedia.org/w/api.php?action=query&format=json&pageids=11601783&redirects&prop=redirects&rdlimit=max

This will give you all redirects to the target page, including their page ids.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top