Unfortunately, this information is not stored in any structured manner — the table you see on the image description page is just a MediaWiki template that renders to an HTML table.
To extract the information from the template, you basically have three options:
Fetch the raw wiki markup of the image description page using
prop=revisions
andrvprop=content
and parse it yourself. Unfortunately, parsing wikitext reliably can be a bit tricky, but several MediaWiki bot frameworks come with pretty good parsers built in.Fetch the parsed HTML version of the page using
action=parse
and use a standard HTML parser to extract the text from the table.Since MediaWiki 1.20, you also have the option to tell MediaWiki to parse the template markup for you and return an XML parse tree by passing the parameter
generatexml=1
to eitherprop=revisions
oraction=parse
. The relevant part will look something like this (reformatted for readability):
<template>
<title>BArch-image</title>
...
<part>
<name>depicted people</name> =
<value>
* Schmidt, Helmut: Bundeskanzler, Verteidigungsminister, SPD, Bundesrepublik Deutschland
</value>
</part>
...
</template>
This is not a perfectly clean representation of the data — it still contains some unparsed wikitext elements, like the *
denoting a bulleted list item — but it should be much easier to parse than the completely raw MediaWiki template markup.