Question

I have a simple SPARQL query:

SELECT DISTINCT ?class1 ?class2
    WHERE {
        ?class1 :child ?attribute1 .
        ?class2 :child ?attribute2 .
        ?attribute1 :objectName ?name1 .
        ?attribute2 :objectName ?name2 .
        FILTER (?name1 = ?name2)
    }

In an RDF graph where one of these 'classes' can have multiple children. What I want to do is, find duplicate classes, where duplicate means that all of the children (identified by objectName) in one class are in another class too.

What it does is, return every class where at least one child exists in another class too.

So I'm looking for some kind of iterating over all children of each class, but I couldn't find a way yet. Would be great if someone could help.

Thanks

Was it helpful?

Solution 2

Here's some sample data with four classes. The first and second classes contain children with names "name1" and "name2". The third class contains "name1" and "name3", and the fourth contains "name3" and "name4". The fifth class contains all the children of the fourth, as well as "name5". So, the first and second classes are equivalent, and the fourth class is a subclass of the fifth.

@prefix : <http://example.org/> .

:class1 :child [ :objectName "name1" ] , 
               [ :objectName "name2" ] .

:class2 :child [ :objectName "name2" ] ,
               [ :objectName "name1" ] .

:class3 :child [ :objectName "name1" ] ,
               [ :objectName "name3" ] .

:class4 :child [ :objectName "name3" ] , 
               [ :objectName "name4" ] .

:class5 :child [ :objectName "name3" ] , 
               [ :objectName "name4" ] ,
               [ :objectName "name5" ] .

Your description sounds like you're actually looking for subclasses, since you mention classes all of whose children are also in another class. As such, this SPARQL query should take care of finding subclass relationships:

prefix : <http://example.org/>

select distinct ?c1 ?c2 where { 
  ?c1 :child [] .
  ?c2 :child [] .
  NOT EXISTS { ?c1 :child [ :objectName ?name ] .
               NOT EXISTS { ?c2 :child [ :objectName ?name ] } }
  FILTER( !sameTerm( ?c1, ?c2 ) )
}

The nested NOT EXIST patterns ensures that the only classes we select are such that there does NOT EXIST an element ?c1 which does NOT EXIST in ?c2. That is, we reject any pairs of sets where there is an element in ?c1 that is not in ?c2; we reject any ?c1,?c2 pair where ?c1 is not a subset of ?c2, so we're keeping just the ones where ?c1 is a subset of ?c2. The sameTerm filter removes the trivial ?c,?c pairs, since everything will be subset of itself. Using Jena's command line ARQ tools, we get these results:

$ arq --data data.n3 --query query.sparql
---------------------
| c1      | c2      |
=====================
| :class4 | :class5 |
| :class2 | :class1 |
| :class1 | :class2 |
---------------------

As expected, :class1 and :class2 are each subsets of the other, and :class4 is a subset of :class5.

If you want equivalent classes, it is sufficient to just a second NOT EXISTS to ensure that ?c2 is also a subset of ?c1:

prefix : <http://example.org/>

select distinct ?c1 ?c2 where { 
  ?c1 :child [] .
  ?c2 :child [] .
  NOT EXISTS { ?c1 :child [ :objectName ?name ] .
               NOT EXISTS { ?c2 :child [ :objectName ?name ] } }
  NOT EXISTS { ?c2 :child [ :objectName ?name ] .
               NOT EXISTS { ?c1 :child [ :objectName ?name ] } }
  FILTER( !sameTerm( ?c1, ?c2 ) )
}

With this query, we get back just :class1 and :class2:

$ arq --data data.n3 --query query.sparql
---------------------
| c1      | c2      |
=====================
| :class2 | :class1 |
| :class1 | :class2 |
---------------------

OTHER TIPS

Note that SPARQL is a declarative language, not an imperative one, so there's no concept of iterating over things. You could do that in a API into an RDF store, but at least in theory expressing it in SPARQL will be more efficient.

I think what you need to do is find all the combinations of classes, and subtract the ones where some objectName differs.

The following is totally untested!

SELECT DISTINCT ?class1 ?class2
WHERE {
   ?class1 :child ?attribute1 .
   ?class2 :child ?attribute2 .
   MINUS {
      ?attribute1 :objectName ?name1 .
      ?attribute2 :objectName ?name2 .
      FILTER (?name1 != ?name2 && ?attribute1 = ?attribute2 )
   }
}

There's a very small chance that's correct :) but it should give you some inspiration.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top