Question

Runninng this query through the DBpedia SPARQL endpoint gets me many results (with the institution column populated):

select ?person ?field ?institution where 

{
  ?person a dbpedia-owl:Agent .
  OPTIONAL { ?person dbpprop:workInstitution ?institution . }
  OPTIONAL { ?person dbpprop:workInstitutions ?institution .}
  ?person dbpprop:field ?field .
} 

However, adding the line FILTER(BOUND(?institution)) returns an empty result set:

select ?person ?field ?institution where 

{
  ?person a dbpedia-owl:Agent .
  OPTIONAL { ?person dbpprop:workInstitution ?institution . }
  OPTIONAL { ?person dbpprop:workInstitutions ?institution .}
  ?person dbpprop:field ?field .
  FILTER(BOUND(?institution))
} 

Why is this? I'd expect all of the results from the first query that have an institution result to show up, but nothing does instead.

Was it helpful?

Solution

Quick answer: this is a DBpedia/Virtuoso bug.

This situation is explicitly described in a presentation An Introduction to SPARQL Optionals by Julian Dolby and Kavitha Srinivas on slide seven, in which they use an example with

optional { ?x name ?label }
optional { ?x nick ?label }

For individuals who have a name value, we'll never see any of the nick values because the optional patterns are left associative, according to 6 Including Optional Values from the SPARQL specification. The authors conclude on slide eight that:

Multiple OPTIONAL clauses binding the same variable is rarely what you want.

You should get the results for the first optional part that matched. That provides a binding for the variable, so bound(...) should be true. As such, I'd say that the DBpedia behavior is a bug.

Experiments with other implementations.

This is an interesting behavior, and we can reproduce it with simple data. Suppose we have some data like this:

@prefix : <http://stackoverflow.com/q/22478183/1281433/> .

:a :r :x ; :p 2 ; :q 3 .
:b :r :x ; :p 4 ; :q 5 .

Then we can use the following query and get the following results with Jena. We only get results for the property :p because optional is left associative, so the pattern on :p is covered first, and each resource in our data had a value for :p.

prefix : <http://stackoverflow.com/q/22478183/1281433/>

select ?x ?v where { 
  ?x :r :x .
  optional { ?x :p ?v }
  optional { ?x :q ?v }
}
----------
| x  | v |
==========
| :b | 4 |
| :a | 2 |
----------

With Jena, adding a filter doesn't remove any results, which I think is the correct behavior, because ?v is bound.

prefix : <http://stackoverflow.com/q/22478183/1281433/>

select ?x ?v where { 
  ?x :r :x .
  optional { ?x :p ?v }
  optional { ?x :q ?v }
  filter(bound(?v))
}
----------
| x  | v |
==========
| :b | 4 |
| :a | 2 |
----------

Union or Property paths to the rescue!

The slides cited above mention that you can use union inside of the optional to get the results you're looking for. With the data I've provided, this means that you can do this:

prefix : <http://stackoverflow.com/q/22478183/1281433/>

select ?x ?v where { 
  ?x :r :x .
  optional { 
    { ?x :p ?v } union
    { ?x :q ?v }
  }
}
----------
| x  | v |
==========
| :b | 4 |
| :b | 5 |
| :a | 2 |
| :a | 3 |
----------

That works without a problem, but it can be made much more concise using property paths. If what you really want is to bind ?v to the value of either the :p or :q property, you can use an alternation property path:

prefix : <http://stackoverflow.com/q/22478183/1281433/>

select ?x ?v where { 
  ?x :r :x .
  optional { ?x :p|:q ?v }
  filter(bound(?v))
}
----------
| x  | v |
==========
| :b | 4 |
| :b | 5 |
| :a | 2 |
| :a | 3 |
----------

Of course, if you're doing filter(bound(?v)), then the pattern ?x :p|:q ?v really isn't optional anymore, so you should probably just move it into the main part of the query:

prefix : <http://stackoverflow.com/q/22478183/1281433/>

select ?x ?v where { 
  ?x :r :x ; :p|:q ?v 
}
----------
| x  | v |
==========
| :b | 4 |
| :b | 5 |
| :a | 2 |
| :a | 3 |
----------

OTHER TIPS

The culprit is the double OPTIONAL on the same variable (?institution). What probably happens is that exactly one OPTIONAL always succeeds, that means that the other one always fails - so the ?institution variable is always bound and not bound :)

You can work around it for example by the following query:

select ?person ?field ?institution

{
  ?person a dbpedia-owl:Agent .
  OPTIONAL { ?person dbpprop:workInstitution ?inst . }.
  OPTIONAL { ?person dbpprop:workInstitutions ?insts . }.
  BIND (IF(bound(?inst), ?inst, ?insts) AS ?institution )
  ?person dbpprop:field ?field .
  filter(bound(?institution)).
} 

It checks which case succeeds and binds it to the resulting variable ?institution.

Yes, the case should be fixed. The compiled SQL contains two checks, one for ?institution that comes from first OPTIONAL and one for ?institution that comes from second OPTIONAL. The formally proper compilation should be either a nested subquery with a filter on its output or FITLER (bound (?institution_1) || bound (?institution_2)) . The really proper compiler should report a warning about weird query, but warnings are not supported by SPARQL protocol :|

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top