Question

I'm trying to figure out if there's a way to determine whether a given article refers to a Person, Organization or Location. I imagine the answer lies somewhere in the "categories" and "clcategories" parameters... however, here's the issue.

Take Albert Einstein for example. The results for the query:

https://en.wikipedia.org/w/api.php?action=query&titles=Albert%20Einstein&prop=categories&clcategories=Category:People%20from%20Berlin

...show me that, indeed, Albert Einstein is a member of the category "People from Berlin".

Similarly, just by browsing through the Category tree on Wikipedia, I can show that "People from Berlin" is a subcategory of the category "People", via this path:

People > People_categories_by_parameter > People by place > People by city > People by country and city > People by city in Germany > People from Berlin

However, Albert Einstein isn't (directly) a member of the category "People", so this query:

https://en.wikipedia.org/w/api.php?action=query&titles=Albert%20Einstein&prop=categories&clcategories=Category:People

...gets me no results under Categories, i.e. it's not a match.

Is there some way to find out whether a page is a member of any Category X, where category X is a descendant of a specified Category Y?

Thanks!

Was it helpful?

Solution

I don't know of a Wikipedia-API way to do this, but I can think of a Freebase way. The following freebase query will get you the Freebase "types" associated with a given Wikipedia article. "People", "Politicians", "Artists", "Places", etc -- are all easily recognizable from those types.

{
  "key": [{
    "namespace": "/wikipedia/en",
    "value": "William_Ambrose"
  }],
  "type": []
}

(Replace en with the actual Wikipedia language, of course, and "William_Amrose" with the Wikipedia article name. See my note below on escaping, though!)

The result, in this case, is:

{
  "result": {
    "type": [
      "/common/topic",
      "/people/person",
      "/people/deceased_person",
      "/government/politician"
    ],
    "key": [{
      "namespace": "/wikipedia/en",
      "value": "William_Ambrose"
    }]
  }
}

... which clearly means that's a "Person" and a "Politician" (and also a "deceased person" at that, but that's another matter.)

See my answer to get wikipedia linked links for notes on how the API works, and a REST example. Especially, take a good look at the notes for getting API keys from Google and for Freebase-escaping the strings.

Good luck.

OTHER TIPS

Nowadays you should ask Wikidata, whose property P31 will tell you things like "is a human".

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top