Pregunta

One of my coworkers is needing to get two sets of RDF triples involving term associations. The terms come from a list, and the associations come from a set of triples using those terms.

  • The first set is all triples with any item in the term list being either the triple's subject or object.

  • The second set is all triples with any two terms being either one or two predicates distant from each other, where the predicates are not necessarily bidirectional. So, for s1 and s2 in the term list, two triples s1 → s3 and s2 → s3 would be valid.

I think I have answers already, but I wanted to ask to contribute to the SPARQL base, as well as check myself.

¿Fue útil?

Solución

Given data like this:

@prefix : <urn:ex:> .

:a :p :b .
:a :p :e .

:b :p :c .
:b :p :d .

:c :p :a .
:c :p :f .

:d :p :a .
:d :p :d .

if we take (:b :c) as the set of interesting terms, the following query will find all the triples that you're interested in. Note that the condition from the first set, i.e,. that from ?s ?p ?o either ?s or ?o is in the term list, gets some of the second set as well, namely the part where two terms are connected, i.e., where both ?s and ?o are in the term list.

prefix : <urn:ex:>

select distinct ?s ?p ?between ?q ?o where { 
  # term list appearing twice in order to 
  # get all pairs of items
  values ?x { :b :c }
  values ?y { :b :c }

  # This handles the first set (where either the subject or
  # object is from the term list).  Note that the first set
  # includes part of the second set;  when two terms from 
  # the list are separated by just one predicate, then it's
  # a case where either the subject or object are from the
  # term list (since *both* are).
  { ?s ?p ?x bind(?x as ?o)} UNION { ?x ?p ?o bind(?x as ?s)}

  UNION 

  # The rest of the second set is when two terms from the
  # list are connected by a path of length two.  This is 
  # a staightforward pattern to write.
  { ?x ?p ?between . ?between ?q ?y .
    bind(?x as ?s)
    bind(?y as ?o) }
}

In the results, single triples are the rows in which just s, p, and o are bound. These cover your first set, as well as the "distance = 1" portion of your second set. The rest of the second set also binds between and q. In terms of the example in your question, between is s3.

$ arq --data data.n3 --query query.sparql
-------------------------------
| s  | p  | between | q  | o  |
===============================
| :a | :p |         |    | :b |
| :b | :p |         |    | :d |
| :b | :p |         |    | :c |
| :c | :p |         |    | :f |
| :c | :p |         |    | :a |
| :c | :p | :a      | :p | :b |
-------------------------------

Update based on Comment

Given the example in the comment, I think that this query can be shortened dramatically to the following:

prefix : <urn:ex:>

select distinct ?x ?p ?between ?q ?y where { 
  values ?x { :b :c }
  values ?y { :b :c }

  { ?x ?p ?between } UNION { ?between ?p ?x }
  { ?between ?q ?y } UNION { ?y ?q ?between }
}

Once we bind ?x ?p ?between or ?between ?p ?x, we're just saying that there's an edge (in either direction) between ?x and ?between. ?y and ?q extend that path so we have:

?x --?p-- ?between --?q-- ?y

where the actual directions of --?p-- and --?q-- could be left or right. This covers all the cases we need. It's probably not hard to see why paths of length two will match this pattern, but the case for triples in which just the subject or object is a special term merits elaboration. Given a triple

<term> <prop> <non-term>

we can get the path

<term> --<prop>-- <non-term> --<prop>-- <term>

and this works in the case that <term> is the object and <non-term> is the subject. It also covers the case in which both the subject and object are terms. On the data above, the results are:

$ arq --data data.n3 --query paths.sparql
-------------------------------
| x  | p  | between | q  | y  |
===============================
| :b | :p | :d      | :p | :b |   
| :b | :p | :c      | :p | :b |   
| :b | :p | :a      | :p | :b |
| :c | :p | :a      | :p | :b |
| :b | :p | :a      | :p | :c |
| :c | :p | :f      | :p | :c |
| :c | :p | :a      | :p | :c |
| :c | :p | :b      | :p | :c |
-------------------------------

If we add some information about which way ?p and ?q were pointing, we can reconstruct the paths:

prefix : <urn:ex:>

select distinct ?x ?p ?pdir ?between ?q ?qdir ?y where { 
  values ?x { :b :c }
  values ?y { :b :c }

  { ?x ?p ?between bind("right" as ?pdir)} UNION { ?between ?p ?x bind("left" as ?pdir)}
  { ?between ?q ?y bind("right" as ?qdir)} UNION { ?y ?q ?between bind("left" as ?qdir)}
}

This gives output:

$ arq --data data.n3 --query paths.sparql
---------------------------------------------------
| x  | p  | pdir    | between | q  | qdir    | y  |
===================================================
| :b | :p | "right" | :d      | :p | "left"  | :b |   # :b -> :d
| :b | :p | "right" | :c      | :p | "left"  | :b |   # :b -> :c 
| :b | :p | "left"  | :a      | :p | "right" | :b |   # :a -> :b 
| :c | :p | "right" | :a      | :p | "right" | :b |   # :c -> :a -> :b
| :b | :p | "left"  | :a      | :p | "left"  | :c |   # :c -> :a -> :b 
| :c | :p | "right" | :f      | :p | "left"  | :c |   # :c -> :f 
| :c | :p | "right" | :a      | :p | "left"  | :c |   # :c -> :a 
| :c | :p | "left"  | :b      | :p | "right" | :c |   # :b -> :c 
---------------------------------------------------

There's a repeat of the c -> a -> b path, but that could probably be filtered out.

If you're actually looking for the set of triples here, and not the particular paths, you can use a construct query which gives you a graph back (since a set of triples is a graph):

prefix : <urn:ex:>

construct {
  ?s1 ?p ?o1 .
  ?s2 ?q ?o2 .
}
where { 
  values ?x { :b :c }
  values ?y { :b :c }

  { ?x ?p ?between .
    bind(?x as ?s1)
    bind(?between as ?o1) }
  UNION
  { ?between ?p ?x .
    bind(?between as ?s1)
    bind(?x as ?o1)}

  { ?between ?q ?y .
    bind(?between as ?s2)
    bind(?y as ?o2) }
  UNION 
  { ?y ?q ?between .
    bind(?y as ?s2)
    bind(?between as ?o2)}
}
$ arq --data data.n3 --query paths-construct.sparql
@prefix :        <urn:ex:> .

<urn:ex:b>
      <urn:ex:p>    <urn:ex:c> ;
      <urn:ex:p>    <urn:ex:d> .

<urn:ex:c>
      <urn:ex:p>    <urn:ex:f> ;
      <urn:ex:p>    <urn:ex:a> .

<urn:ex:a>
      <urn:ex:p>    <urn:ex:b> .

Otros consejos

You can make use of UNION in your queries. In either case, you have a set of patterns you are looking for, and you want to collect information from a UNION of those patterns.

For the first set, getting all triples containing a list item in either subject or object,

SELECT ?s ?p ?o # result triples
WHERE
{
  # get a term bound to ?term
  GRAPH <urn:termsList/>
  { ?term a <urn:types/word> } # or however the terms are stored

  # match ?term against the basic patterns
  GRAPH <urn:associations/>
  {
    {
      ?term ?p ?o . # basic pattern #1
      BIND(?term AS ?s) # so that ?term shows up in the results
    }
    UNION # take ?term as either subject or object
    { 
      ?s ?p ?term . # basic pattern #2
      BIND(?term AS ?o)
    } 
  }
}

First get a binding of all the terms (?term a …).
Then match it against the basic patterns:

?term ?p ?o

and

?s ?p ?term.

After each pattern match, use a binding to place ?term in its proper place among the results. For example, the first pattern has just bound ?p and ?o, so their corresponding ?s needs to be bound next, otherwise it will just show up blank.

For the second set, first we get two words from the list. We want a many-to-many matching:

?term1 a … .
?term2 a … .

The basic patterns:

?term1 ?p1 ?term2

?term1 ?p1 ?term .
?term2 ?p2 ?term .

?term1 ?p1 ?term .
?term ?p2 ?term2 .

?term ?p1 ?term1 .
?term ?p2 ?term2 .

Add a filter on each of the last three to ensure ?term1 and ?term2 are not the same:

FILTER(!SAMETERM(?term1, ?term2))

(We could put these filters outside of all the unions, but it is more efficient to filter variables locally, before further using them.)

Finally UNION the results together:

SELECT ?s1 ?p1 ?o1 ?s2 ?p2 ?o2
WHERE
{
  GRAPH <urn:termsList/>
  { 
    ?term1 a <urn:types/word> . # outer loop variable
    ?term2 a <urn:types/word> . # inner loop variable
  }
  GRAPH <urn:associations/>
  {
    {
      # Only need to check one direction; either end gets 
      # matched into ?term1 at some point
      ?term1 ?p1 ?term2 .
      BIND (?term1 AS ?s1) .
      BIND (?term2 AS ?o1) . # Note we leave ?s2, ?p2, ?o2 unbound here
    }
    UNION
    { 
      ?term1 ?p1 ?term .
      ?term2 ?p2 ?term .
      FILTER(!SAMETERM(?term1, ?term2))
      BIND(?term1 AS ?s1) .
      BIND(?term AS ?o1) .
      BIND(?term2 AS ?s2) .
      BIND(?term AS ?o2)
    }
    UNION
    { 
      ?term1 ?p1 ?term .
      ?term ?p2 ?term2 .
      FILTER(!SAMETERM(?term1, ?term2))
      BIND(?term1 AS ?s1) .
      BIND(?term AS ?o1) .
      BIND(?term AS ?s2) .
      BIND(?term2 AS ?o2)
    }
    UNION
    { 
      ?term ?p1 ?term1 .
      ?term ?p2 ?term2 .
      FILTER(!SAMETERM(?term1, ?term2))
      BIND(?term AS ?s1) .
      BIND(?term1 AS ?o1) .
      BIND(?term AS ?s2) .
      BIND(?term2 AS ?o2)
    }
  }
}

We will test the queries on the following texts: For a word list --

# For God so loved the world, that he gave his only begotten Son, that 
# whosoever believeth in him should not perish, but have everlasting life.
# John 3:16

@prefix : <urn:terms/> .
@prefix t: <urn:types/> .

:For a t:word .
:God a t:word .
:so a t:word .
:loved a t:word .
:the a t:word .
:world a t:word .
:that a t:word .
:he a t:word .
:gave a t:word .
:his a t:word .
:only a t:word .
:begotten a t:word .
:Son a t:word .
:that a t:word .
:whosoever a t:word .
:believeth a t:word .
:in a t:word .
:him a t:word .
:should a t:word .
:not a t:word .
:perish a t:word .
:but a t:word .
:have a t:word .
:everlasting a t:word .
:life a t:word .

And an association list:

# For the wages of sin is death; but the gift of God is eternal life through 
# Jesus Christ our Lord.
# Romans 6:23

@prefix : <urn:terms/> .
@prefix g: <urn:grammar/> .

:For g:clauseAt :wages ;
     g:nextClauseHeadAt :but .
:the g:describes :wages .
:wages g:predicate :is .
:of g:describes :wages ;
    g:nominative :sin .
:is g:object :death .

:but g:clauseAt :gift .
:the g:describes :gift .
:gift g:predicate :is .
:of g:describes :gift ;
    g:nominative :God .
:is g:object :life .
:eternal g:describes :life .
:through g:describes :is ;
         g:nominative :Jesus .
:Christ g:describes :Jesus .
:our g:describes :Lord .
:Lord g:describes :Jesus .

Query 1:

----------------------------------------------------------------------------
| s                   | p                              | o                 |
============================================================================
| <urn:terms/For>     | <urn:grammar/nextClauseHeadAt> | <urn:terms/but>   |
| <urn:terms/For>     | <urn:grammar/clauseAt>         | <urn:terms/wages> |
| <urn:terms/of>      | <urn:grammar/nominative>       | <urn:terms/God>   |
| <urn:terms/is>      | <urn:grammar/object>           | <urn:terms/life>  |
| <urn:terms/eternal> | <urn:grammar/describes>        | <urn:terms/life>  |
| <urn:terms/but>     | <urn:grammar/clauseAt>         | <urn:terms/gift>  |
| <urn:terms/For>     | <urn:grammar/nextClauseHeadAt> | <urn:terms/but>   |
| <urn:terms/the>     | <urn:grammar/describes>        | <urn:terms/gift>  |
| <urn:terms/the>     | <urn:grammar/describes>        | <urn:terms/wages> |
----------------------------------------------------------------------------

Query 2:

----------------------------------------------------------------------------------------------------------------------------------------
| s1              | p1                             | o1                | s2              | p2                      | o2                |
========================================================================================================================================
| <urn:terms/For> | <urn:grammar/nextClauseHeadAt> | <urn:terms/but>   |                 |                         |                   |
| <urn:terms/For> | <urn:grammar/clauseAt>         | <urn:terms/wages> | <urn:terms/the> | <urn:grammar/describes> | <urn:terms/wages> |
| <urn:terms/but> | <urn:grammar/clauseAt>         | <urn:terms/gift>  | <urn:terms/the> | <urn:grammar/describes> | <urn:terms/gift>  |
| <urn:terms/the> | <urn:grammar/describes>        | <urn:terms/wages> | <urn:terms/For> | <urn:grammar/clauseAt>  | <urn:terms/wages> |
| <urn:terms/the> | <urn:grammar/describes>        | <urn:terms/gift>  | <urn:terms/but> | <urn:grammar/clauseAt>  | <urn:terms/gift>  |
----------------------------------------------------------------------------------------------------------------------------------------

Note that there is some redundancy here. That is due to the double-loop nature of how we bind values to ?term1 and ?term2, so that ?term1 becomes ?term2 and vice versa. If this is unacceptable, you can simply change line 1 to only

SELECT DISTINCT ?s1 ?p1 ?o1

This, of course, renders the BINDings for ?s2 and ?o2 unecessary, since they are bound only for the SELECT.

"For if we have been united with [Christ] in a death like his, we shall certainly be united with him in a resurrection like his" (Romans 6:5 ESV).

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top