Question

I have a question for you.

I want to remove the duplicate triples in my file RDF. For example, I have these two RDF groups of triples:

Triple 1=

  • rdf:Description rdf:about="http://Group/row1"
    • vocab:regione Campania /vocab:regione
    • vocab:nome Napoli /vocab:nome
    • vocab:codice NA /vocab:codice
  • /rdf:Description

where vocab:regione, vocab:nome and vocab:codice are predicate.

Triple 2=

  • rdf:Description rdf:about="http://Group/row1"
    • vocab:nome Napoli /vocab:nome
    • vocab:codice NA /vocab:codice
  • /rdf:Description

where vocab:nome and vocab:codice are predicate.

In this case, "Triple2" is included in "Triple1". Does the "Triple2" should be removed?

Thanks in advance.

Was it helpful?

Solution

RDF is graph based representation, and a graph (in this sense) is a set of edges. Sets, by definition, don't have duplicate elements. Of course, a specific serialization of an RDF graph could depict the same triple more than once, and there might be reasons that you would want to avoid that. As a note about terminology, the thing that you've called "Triple 1" is actually three triples:

group:row1  vocab:codice  "NA" .
group:row1  vocab:nome  "Napoli".
group:row1  vocab:regione "Campania".

and what you've called "Triple 2" is actually two triples:

group:row1  vocab:codice  "NA" .
group:row1  vocab:nome  "Napoli".

At any rate: (i) it shouldn't actually be a problem that you have the same triples represented multiple times in your data; (ii) if you want to remove it, then reading in the graph (with just about any RDF processing tool) and writing it out again should give you a representation without duplicated information. For instance, suppose you have the following as data.rdf.

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:group="http://stackoverflow.com/q/23241612/1281433/group/"
    xmlns:vocab="http://stackoverflow.com/q/23241612/1281433/vocab/">
  <rdf:Description rdf:about="http://stackoverflow.com/q/23241612/1281433/group/row1">
    <vocab:regione>Campania</vocab:regione>
    <vocab:nome>Napoli</vocab:nome>
    <vocab:codice>NA</vocab:codice>
  </rdf:Description>
  <rdf:Description rdf:about="http://stackoverflow.com/q/23241612/1281433/group/row1">
    <vocab:nome>Napoli</vocab:nome>
    <vocab:codice>NA</vocab:codice>
  </rdf:Description>
</rdf:RDF>

Here's what you get when you read it in with Jena's rdfcat and write it out again:

$ rdfcat data.rdf
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:group="http://stackoverflow.com/q/23241612/1281433/group/"
    xmlns:vocab="http://stackoverflow.com/q/23241612/1281433/vocab/">
  <rdf:Description rdf:about="http://stackoverflow.com/q/23241612/1281433/group/row1">
    <vocab:regione>Campania</vocab:regione>
    <vocab:nome>Napoli</vocab:nome>
    <vocab:codice>NA</vocab:codice>
  </rdf:Description>
</rdf:RDF>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top