Question

I am trying to learn to use RDF and am trying to pull a set of facts out of dbpedia as my learning exercise. The following code sample is sort of working but for subjects such as spouse it always pulls out the person them selves.

QUESTIONS:

  1. get_name_from_uri() pulls out the last part of the URI and removes the underscores - There has got to be a better way to get a persons name
  2. results for spouse pull back the spouse but also pull back the data subject - not sure whats going on there
  3. Some results pull back data in both URI format and as a text item -

This is the output from the code block and shows some of the odd results I am getting (see the mixed output in the properties, the fact he is married to himself and the mangled name of Josephine?

Accessing facts for Napoleon  held at  http://dbpedia.org/resource/Napoleon

There are  800  facts about Napoleon stored at the URI
http://dbpedia.org/resource/Napoleon

Here are a few:-
Ontology:deathdate

Napoleon died on 1821-05-05

Ontology:birthdate
Napoleon was born on 1769-08-15

Property:spouse retruns the person themslves twice !
Napoleon was married to  Marie Louise, Duchess of Parma
Napoleon was married to  Napoleon
Napoleon was married to  Jos%C3%A9phine de Beauharnais
Napoleon was married to  Napoleon

Property:title retruns text and uri's
Napoleon  Held the title:  "The Death of Napoleon"
Napoleon  Held the title: http://dbpedia.org/resource/Emperor_of_the_French
Napoleon  Held the title: http://dbpedia.org/resource/King_of_Italy
Napoleon  Held the title:  First Consul of France
Napoleon  Held the title:  Provisional Consul of France
Napoleon  Held the title:  http://dbpedia.org/resource/Napoleon
Napoleon  Held the title:  Emperor of the French
Napoleon  Held the title: http://dbpedia.org/resource/Co-Princes_of_Andorra
Napoleon  Held the title:  from the Memoirs of Bourrienne, 1831
Napoleon  Held the title:  Protector of the Confederation of the Rhine

Ontology birth place returns three records
Napoleon was born in  Ajaccio
Napoleon was born in  Corsica
Napoleon was born in  Early modern France

This is the python that produces the output above, it requires rdflib and is very much a work in progress.

import rdflib
from rdflib import Graph, URIRef, RDF

######################################
#  A quick test of a python library reflib to get data from an rdf graph
# D Moore 15/3/2014
# needs rdflib > version 3.0

# CHANGE THE URI BELOW TO A DIFFERENT PERSON AND SEE WHAT HAPPENS
# COULD DO WITH A WEB FORM 
# NOTES:
#
#URI_ref = 'http://dbpedia.org/resource/Richard_Nixon'
#URI_ref = 'http://dbpedia.org/resource/Margaret_Thatcher'
#URI_ref = 'http://dbpedia.org/resource/Isaac_Newton'
#URI_ref = 'http://dbpedia.org/resource/Richard_Nixon'
URI_ref = 'http://dbpedia.org/resource/Napoleon'
#URI_ref = 'http://dbpedia.org/resource/apple'
##########################################################


def get_name_from_uri(dbpedia_uri):  
    # pulls the last part of a uri out and removes underscores
    # got to be an easier way but it works
    output_string = ""
    s = dbpedia_uri
    # chop the url into bits devided by the /
    tokens = s.split("/")
    # because the name of our person is in the last section itterate through each token 
    # and replace the underscore with a space
    for i in tokens :
        str = ''.join([i])
        output_string = str.replace('_',' ')
    # returns the name of the person without underscores 
    return(output_string)

def is_person(uri):
#####  SPARQL way to do this
    uri = URIRef(uri)
    person = URIRef('http://dbpedia.org/ontology/Person')
    g= Graph()
    g.parse(uri)
    resp = g.query(
        "ASK {?uri a ?person}",
        initBindings={'uri': uri, 'person': person}
    )
    print uri, "is a person?", resp.askAnswer
    return resp.askAnswer

URI_NAME = get_name_from_uri(URI_ref)
NAME_LABEL = ''

if is_person(URI_ref):
    print "Accessing facts for", URI_NAME, " held at ", URI_ref

    g = Graph()
    g.parse(URI_ref)
    print "Person Extract for", URI_NAME
    print "There are ",len(g)," facts about", URI_NAME, "stored at the URI ",URI_ref
    print "Here are a few:-"


    # Ok so lets get some facts for our person
    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/birthName")):
        print URI_NAME, "was born " + str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/deathDate")):
        print URI_NAME, "died on", str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/birthDate")):
        print URI_NAME, "was born on", str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/eyeColor")):
        print URI_NAME, "had eyes coloured", str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/spouse")):
        print URI_NAME, "was married to ", get_name_from_uri(str(stmt[1]))

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/reigned")):
        print URI_NAME, "reigned ", get_name_from_uri(str(stmt[1]))

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/children")):
        print URI_NAME, "had a child called ", get_name_from_uri(str(stmt[1]))

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/profession")):
        print URI_NAME, "(PROPERTY profession) was trained as a  ", get_name_fro    m_uri(str(stmt[1]))

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/child")):
        print URI_NAME, "PROPERTY child ", get_name_from_uri(str(stmt[1]))

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/deathplace")):
        print URI_NAME, "(PROPERTY death place) died at: ", str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/title")):
        print URI_NAME, "(PROPERTY title) Held the title: ", str(stmt[1])


    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/sex")):
        print URI_NAME, "was a ", str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/knownfor")):
        print URI_NAME, "was known for ", str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/birthPlace")):
        print URI_NAME, "was born in ", get_name_from_uri(str(stmt[1]))

else:
    print "ERROR - "
    print "Resource", URI_ref, 'does not look to be a person or there is no record in dbpedia'
Was it helpful?

Solution

Getting names

*get_name_from_uri* is doing something with the URI. Since DBpedia data has rdfs:labels on almost everything, it's probably a better idea to ask for the rdfs:label and to use that as a value. E.g., look at the results of this SPARQL query run the DBpedia SPARQL endpoint:

select ?spouse ?spouseName where {
  dbpedia:Napoleon dbpedia-owl:spouse ?spouse .
  ?spouse rdfs:label ?spouseName .
  filter( langMatches(lang(?spouseName),"en") )
}
spouse                                                      spouseName
http://dbpedia.org/resource/Jos%C3%A9phine_de_Beauharnais   "Joséphine de Beauharnais"@en
http://dbpedia.org/resource/Marie_Louise,_Duchess_of_Parma  "Marie Louise, Duchess of Parma"@en

Unexpected Spouses

The documentation for subject_objects says that

subject_objects(self, predicate=None)

A generator of (subject, object) tuples for the given predicate

You're seeing, correctly, that there are four triples in DBpedia that have the predicate dbpprop:spouse (by the way, is there a reason you're not using dbpedia-owl:spouse?) and have Napoleon as a subject or object:

Napoleon                       spouse Marie Louise, Duchess of Parma
Marie Louise, Duchess of Parma spouse Napoleon 
Napoleon                       spouse Jos%C3%A9phine de Beauharnais
Jos%C3%A9phine de Beauharnais  spouse Napoleon

For each one of those, you're printing out

"Napoleon was married to X"

where X is the object of the triple. Perhaps you should use objects instead:

objects(self, subject=None, predicate=None)

A generator of objects with the given subject and predicate

URI vs. text (literal) results

The data described by DBpedia ontology properties (those whose URIs begin with http://dbpedia.org/ontology/, typically abbreviated dbpedia-owl:) is much “cleaner” than the data described by the DBpedia raw data properties (those whose URIs begin with http://dbpedia.org/property/, typically abbreviated dbpprop:). E.g., when you're looking at the titles, you're using the property dbpprop:title, and there are both URIs and literals as values. It doesn't look like there's a dbpedia-owl:title, though, so in this case you'll just have to deal with it. It's easy enough to filter out one or the other though:

select ?title where {
  dbpedia:Napoleon dbpprop:title ?title
  filter isLiteral(?title)
}
title
================================================
"Emperor of the French"@en
"Protector of the Confederation of the Rhine"@en
"First Consul of France"@en
"Provisional Consul of France"@en
""The Death of Napoleon""@en
"from the Memoirs of Bourrienne, 1831"@en
select ?title where {
  dbpedia:Napoleon dbpprop:title ?title
  filter isURI(?title)
}
title
=================================================
http://dbpedia.org/resource/Co-Princes_of_Andorra
http://dbpedia.org/resource/Emperor_of_the_French
http://dbpedia.org/resource/King_of_Italy
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top