Question

I would like to query Wikimedia API to find all images that contain a keyword and filter only the images that are public domain. So no additional CC-SA license.

Curently I'm using the following query to extract the images:

http://en.wikipedia.org/w/api.php?action=query&list=search&format=json&srsearch=roses&srnamespace=6&srinfo=totalhits%7Csuggestion&srprop=size%7Cwordcount%7Ctimestamp%7Cscore%7Csnippet%7Ctitlesnippet%7Credirecttitle%7Credirectsnippet%7Csectiontitle%7Csectionsnippet%7Chasrelated&srredirects=&srlimit=10&generator=images&titles=Wikipedia%3APublic_domain&gimlimit=10

But this is curently returning all the images regardless of their licencing. Maybe I need to modify the namespace but I don't know where to look.

Thanks

Was it helpful?

Solution

Um, your current api query does two very distinct things:

  • get the first 10 images from the page Wikipedia:Public Domain - the pages result, you could specify additional properties to get for that result set
  • search the namespace 6 for the word roses

Unfortunately, you can't restrict the search module to search only in some categories, you can only limit it to a single namespace. So you would need to get the categories of all search results and filter them yourself for images in the Category:Public Domain (and all its subcategories). The API query would look like

api.php?action=query&prop=imageinfo|categories&generator=search&gsrsearch=roses&gsrnamespace=6&format=json

Don't forget to continue the query, if you want 10 images that match your category criteria you might need to query (a lot) more than that.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top