Question

I'm trying to extract alumni lists for universities using SPARQL.

I've identified the ontologies I need:

I tried this query, which you can examine here:

 SELECT * WHERE {
  ?University dbpedia2:alumni ?Person .
  }

Which seemed to make sense, except this returns counts instead of people, as the ontology says the property contains.

I found this query somewhere which seemed to do a better job finding universities, but was very slow.

SELECT * WHERE {
  { <http://dbpedia.org/ontology/University> ?property ?hasValue }
  UNION
  { ?isValueOf ?property <http://dbpedia.org/ontology/University> }
}

I also tried going the other way, start with all people and look for their almae matres, in this form:

 SELECT * WHERE {
  ?person dbpedia2:almaMater ?University
  }

But this is much slower, possibly because searching through the people space is too laborious. This does actually work, but it returns a different set of results in application---namely, all people with a listed alma mater, rather than all people listed by universities as alumni. I'd prefer a syntax that gets me the alumni.

How can I phrase this to return all alumni listed for universities?

Was it helpful?

Solution

The performance of DBpedia's SPARQL endpoint can be a bit unreliable at times. After all, it's apublic service, and isn't intended for huge queries. Nonetheless, I think you can get what you're looking for here without too much trouble. First, you can check how many results there are with a query like this at the public SPARQL endpoint:

select (count(*) as ?nResults) where {
 ?person dbpedia-owl:almaMater ?almaMater
}

SPARQL results (64928)

Now, if you just want the big list, you'd get it like this. The order by helps organize the results for easy consumption, but isn't technically necessary:

select ?almaMater ?person where {
 ?person dbpedia-owl:almaMater ?almaMater
}
order by ?almaMater ?person

SPARQL results

If you need to place some additional restrictions on ?almaMater, e.g., to ensure that it's a university, then you can add them to the query. For instance:

select ?almaMater ?person where {
 ?person dbpedia-owl:almaMater ?almaMater .
 ?almaMater a dbpedia-owl:University .
}
order by ?almaMater ?person

SPARQL results

OTHER TIPS

In your last query, you are almost there. However, you are currently asking for any resource that can take the place of the ?University variable. As you only want universities to take that place, you can use another triple to further restrict that variable:

SELECT * WHERE {
    ?University a dbpedia-owl:University.
    ?person dbpedia2:almaMater ?University.
}

This means that ?University can only be an individual of class dbpedia-owl:University (where dbpedia-owl is mapped to http://dbpedia.org/ontology/).

Your first query:

SELECT * WHERE {
  ?University dbpedia2:alumni ?Person .
}

isn't just returning counts; it's returning both counts and individual alumni. Apparently dbpedia's data here is poor quality and there are a number of triples misusing the dbpedia2:alumni relation.

You can filter out the counts by adding a second condition requiring that an entity satisfying Person be a member of the appropriate class:

SELECT * WHERE {
  ?university dbpedia2:alumni ?person .
  ?person rdf:type <http://dbpedia.org/ontology/Person>
}

What you see running this is that there are very few individuals tagged as alumni; the data is surprisingly scant, unfortunately.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top