Question

I'm trying to get the links from a particular page in order as presented on page, or reasonably close. I believe I found the correct API call to do so using the parse request, however I'm noticing that I'm getting alot of what I consider "junk" links that are really links done in references. For example, for Albert Einstein, I do the request (http://en.wikipedia.org/w/api.php?action=parse&format=json&page=Albert%20Einstein&redirects=&prop=links) and I will get links that occur in the references like E. T. Whittaker and JSTOR. For my purposes, these links in references are "junk".

Alternatively, I looked at the query command but found that the query command with prop=link will end up just giving me the links alphabetized which loses part of the information I was wanting to look at. Additionally, this API query also includes these "junk" links from within references too.

Is there anyway for me to tell the parse command to ignore the links that are within reference tags or do I need to instead retrieve the text using the API and then do the parsing myself client-side?

Was it helpful?

Solution 2

I don't think there is a direct way to do this. One workaround would be to get the text of the page, remove the code that actually shows the references ({{reflist}} or <references />) and then use the API to parse that. This will add a "junk" link to Help:Cite errors/Cite error refs without references, but it's easy to ignore that one page.

OTHER TIPS

I also don't think there is a way to get exactly what your looking for. If you ask MediaWiki to parse the page it is going to resolve all the template references before giving it back. If I needed to do what your looking for I would instead just get the raw wikitext of the page:

http://en.wikipedia.org/w/api.php?action=parse&format=json&page=Albert%20Einstein&redirects=&prop=wikitext

and then using that I would do my own parsing. It should be easy enough to use a regex to find all wikilinks. It would also be easy enough to remove all templates from the page.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top