Question

I`m trying to build query with Wiki API that will return all internal links from specific article in id format. I have pageId of some article. For example for article "Android (Operational System)" id is 12610483. In my client side i need to work only with id and later obtain all information only by id. My goal is to find all internal links(ids of articles) from give article id.

Unfortunately, the only possible way i found is to obtain links that represented by titles of articles: http://en.wikipedia.org/w/api.php?action=parse&format=json&pageid=12610483&prop=links

Is there any other way to obtain ids of links as well and not only titles?

Was it helpful?

Solution

What you want to do is to use action=query&prop=links to get data from the pagelinks database table, instead of parsing the page text.

This will still give you only page titles (because a link can lead to a non-existent page, which implies no page id).

But you can fix that by using prop=links as a generator:

http://en.wikipedia.org/w/api.php?action=query&format=json&pageids=12610483&generator=links&gpllimit=max

If the article has many links (like the one you suggested), you will need to use paging (see the gplcontinue element).

OTHER TIPS

I think you need to use PHP Simple HTML DOM Parser

you cant find it here http://simplehtmldom.sourceforge.net/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top