Pergunta

I use the MediaWiki API to find images of Wikipedia articles. However, I also get all the useless icons, like the broom for when a article needs to be cleaned up or the creative commons logo that marks something to be placed under a creative commons license.

Is there a way to detect which images are such icons so I can drop them? E.g. is there a way to query the size at which the image was embedded (rather then the size of the original image, which might be huge even for icons) so that I can drop all small ones. I'm not really interested in very small images anyway.

Foi útil?

Solução

As far as I know, no. That information is simply not stored in the database, and is therefore also not available via the API.

Some things you could perhaps do include:

  • Load the HTML markup of the article (via the API action=parse, or simply via index.php with action=render) and extract the image sizes from it.

  • Simply build a list of images that should be excluded. You could do this programmatically (e.g. find all images used on all templates included in Category:Wikipedia maintenance templates and all its subcategories) or just add any unwanted images to the exclusion list as you come across them.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top