Alternatives to SPARQL query with lots of UNIONs

https://stackoverflow.com/questions/15035337

11-03-2022
|

Question

I have some named graphs stored in Virtuoso, and I want to find the one that matches the highest number of terms from a provided list.

My query is constructed programatically and looks like this:

SELECT DISTINCT ?graph (count(DISTINCT ?match) as ?matches)
WHERE {
  GRAPH ?graph {
    {?match rdf:label "term 1"} 
     UNION {?match rdf:label "term 2"} 
     UNION {?match rdf:label "term 3"}
     ...
  }
}
ORDER BY DESC(?matches)

Each term becomes another UNION clause.

Is there a better way to do this? The query gets long and ugly fast, and Virtuoso complains when there are too many terms.

Solution 2

(it's rdfs:label)

An alternative way to write it is:

{ ?match rdfs:label ?X . FILTER (?x in ("term 1", "term 2", "term 3")) }

or (SPARQL 1.0)

{ ?match rdfs:label ?X . FILTER ( ?x = "term 1" || ?x = "term 2" || ?x = "term 3" )  }

OTHER TIPS

In SPARQL 1.1, there's a values clause that can help out with this. It lets you write:

select ?match where {
  values ?label { "term 1" "term 2" "term 3" }
  ?match rdfs:label ?label
}

The values solution is even more powerful as it allows the use of UNDEF as follows (e.g.):

VALUES (?s ?p ?o) { (<http://abc#X> <http://abc#P1> UNDEF)
                    (UNDEF <http://abc#P2> <http://abc#Y>) }

UNDEF has a wildcard function and the returned set of triplets is the union of matching each value triplet individually. But of course for large datasets it might be to slow from a performance point of view

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow