Jena/Sparql/Arq: injecting some statements in the model during the query

https://stackoverflow.com/questions/12500245

02-07-2021
|

Frage

I have build a small RDF model: it only contains a few triples describing some items on the human genome.

I want to retain only those items overlapping some genomic segments (say a "gene"), stored in another relational database. This database of genes is far too big to be inserted in my inital RDF model.

Is there any way to extend ARQ to inject some new Statements (the RDF statements describing the only genes overlapping the items) in my model during the query ?

input:

uri:object1  my:hasChromosome "chr1" .
uri:object1  my:hasStartPosition "1235689887" .
uri:object1  my:hasEndPosition "2897979879" .
uri:object1  dc:title "my variation" .

output:

uri:object1  my:hasChromosome "chr1" .
uri:object1  my:hasStartPosition "1235689887" .
uri:object1  my:hasEndPosition "2897979879" .
uri:object1  dc:title "my variation" .
uri:gene1  dc:title "GeneName" .

I've read about http://jena.sourceforge.net/ARQ/arq-query-eval.html but I'm lost: which mechanism of extension should I choose ? Property ? Is there any more complete example on web ?

Thanks,

Lösung

You have two datastores. One a small dataset in a Jena in memory Model, and a large set of gene related data in a relational database. You want to write a sparql query as if the large set of data is local without actually importing it. (The actual data transformation you want to do is a bit vague.)

In SPARQL 1.1 you can do this using the SERVICE keyword between sparql endpoints. To be able to use your relational database of gene data as a SPARQL endpoint you need a SPARQL to SQL translator such as D2RQ or convert the data to RDF and load it into a general purpose SPARQL capable triple-store.

Once the gene data is available in a SPARQL endpoint.

PREFIX my: <...>
PREFIX f:  <java:com.example.DBFunctions.>

INSERT { ?missing a my:Gene } # mark a region as a gene
WHERE {
    ?missing my:hasChromosome ?chr ; 
         my:hasStartPosition ?start ;
         my:hasEndPosition ?end .
    SERVICE<http://localhost:????/gene_data/sparql>{
       ?gene a my:Gene .
         my:hasStartPosition ?gStart ;
         my:hasEndPosition ?gEnd .
       #Detect overlap.
       FILTER( !(?start > ?gEnd || ?end < ?gStart) ) .
    }
}

The other option is to do the filter as @user205512 shows by using a custom function. Where the filter java code uses JDBC to connect to the relational database.

Andere Tipps

Details are a bit thin here. Start simple, using a custom function. That will let you do external lookups in FILTERs or, using BIND, retrieve values.

For updating you might want to consider SPARQL Update.

Finally, you said

I want to retain only those items overlapping some genomic segments (say a "gene"), stored in another relational database.

So perhaps something like:

PREFIX my: <...>
PREFIX f:  <java:com.example.DBFunctions.>

DELETE { ?missing ?p ?o } # Purge the non-overlapping objects
WHERE {
    ?missing my:hasChromosome ?chr ; 
             my:hasStartPosition ?start ;
             my:hasEndPosition ?end .
    FILTER (!f:overlaps(?chr, ?start, ?end)) # true if not overlapping
}

Ok, I'm guessing here but I hope that helps a little.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow