質問

At the moment, I am getting rows with Unicode decode issues, while using SPARQL on Dbpedia (using Virtuoso servers). This is an example of what I am getting Knut %C3%85ngstr%C3%B6m. The right name is Knut Ångström. Cool, now how do I fix this? My crafted query is:

select distinct (strafter(str(?influencerString),str(dbpedia:)) as ?influencerString) (strafter(str(?influenceeString),str(dbpedia:)) as ?influenceeString) where {
  { ?influencer a dbpedia-owl:Person . ?influencee a dbpedia-owl:Person .
    ?influencer dbpedia-owl:influenced ?influencee .
    bind( replace( str(?influencer), "_", " " ) as ?influencerString )
    bind( replace( str(?influencee), "_", " " ) as ?influenceeString )
}
  UNION
  { ?influencee a dbpedia-owl:Person . ?influencer a dbpedia-owl:Person .
    ?influencee dbpedia-owl:influencedBy ?influencer .
    bind( replace( str(?influencee), "_", " " ) as ?influenceeString )
    bind( replace( str(?influencer), "_", " " ) as ?influencerString )
}
}
役に立ちましたか?

解決

The DBpedia wiki explains that the identifiers for resources in the English DBpedia dataset use URIs, not IRIs, which means that you'll end up with encoding issues like this.

3. Denoting or Naming “things”

Each thing in the DBpedia data set is denoted by a de-referenceable IRI- or URI-based reference of the form http://dbpedia.org/resource/Name, where Name is derived from the URL of the source Wikipedia article, which has the form http://en.wikipedia.org/wiki/Name. Thus, each DBpedia entity is tied directly to a Wikipedia article. Every DBpedia entity name resolves to a description-oriented Web document (or Web resource).

Until DBpedia release 3.6, we only used article names from the English Wikipedia, but since DBpedia release 3.7, we also provide localized datasets that contain IRIs like http://xx.dbpedia.org/resource/Name, where xx is a Wikipedia language code and Name is taken from the source URL, http://xx.wikipedia.org/wiki/Name.

Starting with DBpedia release 3.8, we use IRIs for most DBpedia entity names. IRIs are more readable and generally preferable to URIs, but for backwards compatibility, we still use URIs for DBpedia resources extracted from the English Wikipedia and IRIs for all other languages. Triples in Turtle files use IRIs for all languages, even for English.

There are several details on the encoding of URIs that should always be taken into account.

In this particular case, it looks like you don't really need to break up the identifier so much as get a label for the entity.

## If things were guaranteed to have just one English label, 
## we could simply take ?xLabel as the value that we want with
## `select ?xLabel { … }`, but since there might be more than 
## one, we can group by `?x` and then take a sample from the
## set of labels for each `?x`.

select (sample(?xLabel) as ?label) {
  ?x dbpedia-owl:influenced dbpedia:August_Kundt ;
     rdfs:label ?xLabel .
  filter(langMatches(lang(?xLabel),"en"))
}
group by ?x

SPARQL results

Simplifying your query a bit, we can have this:

select
  (sample(?rLabel) as ?influencerName)
  (sample(?eLabel) as ?influenceeName)
where {
  ?influencer dbpedia-owl:influenced|^dbpedia-owl:influencedBy ?influencee .
  dbpedia-owl:Person ^a ?influencer, ?influencee .

  ?influencer rdfs:label ?rLabel .
  filter( langMatches(lang(?rLabel),"en") )

  ?influencee rdfs:label ?eLabel .
  filter( langMatches(lang(?eLabel),"en") )
}
group by ?influencer ?influencee

SPARQL results

If you don't want language tags on those results, then add a call to str():

select
  (str(sample(?rLabel)) as ?influencerName)
  (str(sample(?eLabel)) as ?influenceeName)
where {
  ?influencer dbpedia-owl:influenced|^dbpedia-owl:influencedBy ?influencee .
  dbpedia-owl:Person ^a ?influencer, ?influencee .

  ?influencer rdfs:label ?rLabel .
  filter( langMatches(lang(?rLabel),"en") )

  ?influencee rdfs:label ?eLabel .
  filter( langMatches(lang(?eLabel),"en") )
}
group by ?influencer ?influencee

SPARQL results

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top