Pergunta

I am trying to parse a page on a wikia to get additional information for a Infobox Book template that is on the page. The problem is the I can only get the template's source instead of the transformed template on the page.

I'm using the following url as a base: http://starwars.wikia.com/api.php?format=xml&action=expandtemplates&text={{Infobox%20Book}}&generatexml=1

The documentation doesn't really tell me how to point it to a specific page and parse the transformed template from the page. Is this even possible or do I need to parse it all myself?

Foi útil?

Solução

To expand a template with the parameters from a given page, you will have to provide those parameters. There is no way for the API to know how the template is used in different pages (it could even be used twice!).

This works:

action=expandtemplates&text={{Infobox Book|book name=Lost Tribe of the Sith: Skyborn}}

You will, of course have to keep adding all the parameters you want to parse (there are 14 in your example).

If you have templates that change automatically depending on which page they are (that is not the case here), e.g. by making use of magic words such as {{PAGENAME}}, you can add &page=Lost_Tribe_of_the_Sith:_Skyborn to your API call, to set the context the template should be expanded in.

If you to not know the parameters given, you can either:

  1. Render the whole page with index.php?action=render&title=Lost_Tribe_of_the_Sith:_Skyborn, and parse the returned html to carve out the actual infobox

  2. Fetch (action=query&prop=revisions) and parse the wikicode to get the parameters to the template, and supply them to the expandtemplates call

  3. Start using an extension like Semantic MediaWiki, that allows you to treat your wiki more like a database

1 and 2 can go wrong in any number of ways, of course, as with a wiki you have, by definition, no way of knowing that the content is always entered in a consistent way.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top