Question

I want to query the Wikipedia API for information found in the infobox video games template. So, I make the following API call to get a list of pages which have the infobox video games template embedded in them:

http://en.wikipedia.org/w/api.php?action=query&generator=embeddedin&geititle=template:infobox%20video%20game

And I get the following result:

<?xml version="1.0"?>
<api>
  <query-continue>
    <embeddedin geicontinue="10|Infobox_video_game|8484" />
  </query-continue>
  <query>
    <pages>
      <page pageid="785" ns="0" title="Asteroids (video game)" />
      <page pageid="2215" ns="0" title="Sid Meier&#039;s Alpha Centauri" />
      <page pageid="4098" ns="0" title="Puzzle Bobble" />
      <page pageid="4965" ns="0" title="Bubble Bobble" />
      <page pageid="6023" ns="0" title="Castle of the Winds" />
      <page pageid="6259" ns="0" title="Civilization (video game)" />
      <page pageid="6614" ns="0" title="Chrono Trigger" />
      <page pageid="7431" ns="0" title="Counter-Strike" />
      <page pageid="7840" ns="0" title="Chrono Cross" />
      <page pageid="8090" ns="0" title="Day of the Tentacle" />
    </pages>
  </query>
</api>

Great. Perfect. Now, I don't want a list of all such pages, because that's almost totally useless. I want a list of all such pages that match "mario." So I make the following API call.

http://en.wikipedia.org/w/api.php?action=query&generator=embeddedin&geititle=template:infobox%20video%20game&list=search&srsearch=mario

And I get the following result (truncated to 2 results for readability)

<?xml version="1.0"?>
<api>
  <query-continue>
    <search sroffset="10" />
    <embeddedin geicontinue="10|Infobox_video_game|8484" />
  </query-continue>
  <query>
    <pages>
      <page pageid="785" ns="0" title="Asteroids (video game)" />
      <page pageid="2215" ns="0" title="Sid Meier&#039;s Alpha Centauri" />
    </pages>
    <searchinfo totalhits="39118" />
    <search>
      <p ns="0" title="Mario" snippet="is a fictional character  in the &lt;span class=&#039;searchmatch&#039;&gt;Mario&lt;/span&gt; video game franchise  by Nintendo , created by Japanese video game designer  Shigeru Miyamoto .  &lt;b&gt;...&lt;/b&gt; " size="58561" wordcount="8141" timestamp="2014-03-16T02:28:37Z" />
      <p ns="0" title="Mario (disambiguation)" snippet="&lt;span class=&#039;searchmatch&#039;&gt;Mario&lt;/span&gt;  is a fictional character in his eponymous video game series. &lt;span class=&#039;searchmatch&#039;&gt;Mario&lt;/span&gt; may also refer to:  People : &lt;span class=&#039;searchmatch&#039;&gt;Mario&lt;/span&gt; (given name), a list of people  &lt;b&gt;...&lt;/b&gt; " size="1354" wordcount="189" timestamp="2013-08-26T18:46:18Z" />
    </search>
  </query>
</api>

Which is also perfect, except not so much, because it gave me essentially two separate query results in the same call, which is a use case for never. Is there a way I can query the results that I get from an embeddedin list, or is the wikipedia API basically useless?

Was it helpful?

Solution

The API is not “basically useless”, but it won't work for your particular kind of query.

One way to work around that would be to use DBpedia, which extracts information from Wikipedia infoboxes and can be queried using SPARQL.

To get all video games that contain Mario in their title, you could use something like:

SELECT *
WHERE {
    ?game rdf:type dbpedia-owl:VideoGame.
    ?game rdfs:label ?label.
    FILTER regex(?label, "Mario")
}

Or, maybe even better, get all games in the Super Mario series:

SELECT *
WHERE {
    <http://dbpedia.org/resource/Super_Mario_(series)> dbpprop:game ?game
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top