Вопрос

I am a beginner in the sparql area. I have written this query:

prefix pp: <http://purl.org/dc/elements/1.1/>
select ?title,?autor1, ?autor2
from <http://gutenberg.lib>
where {
      ?s pp:title ?title.
      ?s pp:creator ?ID1.
      ?ID1 ?p ?autor1.
      optional{ ?s pp:creator ?ID2.
                ?ID2 ?p ?autor2.
              }
} order by ?s

and I run it against data from guttenberg project. Data has the form:

 S1 pp:title "TITLE11"
 S1 pp:creator "CREATOR11"
 S1 pp:creator "CREATOR12"
 S2 pp:title "TITLE21"
 S2 pp:creator "CREATOR21"
 S2 pp:creator "CREATOR22"
 S2 pp:creator "CREATOR23"

etc

I would expect I obtain something like that:

 TITLE11, CREATOR11, CREATOR11
 TITLE11, CREATOR11, CREATOR12
 TITLE11, CREATOR12, CREATOR11
 TITLE11, CREATOR12, CREATOR12

but I obtained something like that:

 TITLE11, CREATOR11, CREATOR11
 TITLE11, CREATOR12, CREATOR12

so there is no cartesian product like for SQL.

Is that a bug in Virtuoso or a feature ?

Please note, that ?p in fragment ?ID1 ?p ?autor1. is there because there is no "author real name" property within data. Guttenberg gives only string like that: http://www.w3.org/1999/02/22-rdf-syntax-ns#_1 for 1-st author, http://www.w3.org/1999/02/22-rdf-syntax-ns#_2 for second etc.


For example ( with real data) it looks like that:

The Mystery     http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag  http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag

The Mystery     White, Stewart Edward, 1873-1946    White, Stewart Edward, 1873-1946

The Mystery     Adams, Samuel Hopkins, 1871-1958    Adams, Samuel Hopkins, 1871-1958

and there is no other triples ( title, author1 author2 ) for "The Mystery" book.

Это было полезно?

Решение

Literals can't be subjects:

You're not showing us the data or the results exactly. If the data is actually of the form:

S1 pp:title "TITLE11"
S1 pp:creator "CREATOR11"
S1 pp:creator "CREATOR12"

where the values of the creator property are strings, then you shouldn't get any matches for

?s pp:creator ?ID1.
?ID1 ?p ?autor1.

because ?ID1 would be bound to a string, and then you can't have any matches for the second line, because strings can't be subjects of RDF triples.

Rewriting the query

I downloaded the rdf-files.tar.bz2 from the Current RDF Format section of the RDF data available from Project Gutenberg. After noting that The Mystery has 10008, I navigated to the file cache/epub/10008/pg10008.rdf and I see this data (abbreviated to the relevant parts):

<http://www.gutenberg.org/ebooks/10008>
        dcterms:creator    <http://www.gutenberg.org/2009/agents/1635> , <http://www.gutenberg.org/2009/agents/247> ;
        dcterms:title      "The Mystery" .

<http://www.gutenberg.org/2009/agents/1635>
        pgterms:alias      "Fabian, Warner" ;
        pgterms:name       "Adams, Samuel Hopkins" .

<http://www.gutenberg.org/2009/agents/247>
        pgterms:name       "White, Stewart Edward" .

Notably, I don't see any use of rdf:Bag in that file. Perhaps you're using the legacy RDF format that's also available for download. If you're committed to using that, please add a comment, and we can make that work, too, but it seems beneficial to use the newer data where it's available, so I'll continue with this data.

If you want each title listed with each combination of authors, you can use a query like the following to get your results. (I notice that you said you expected the repeated authors. That seems a bit unusual to me, so I've added a filter to remove those, but you can simply remove the filter if you really do want ?name_i and ?name_j to be able to be bound to the same value.)

prefix dcterms: <http://purl.org/dc/terms/> 
prefix pgterms: <http://www.gutenberg.org/2009/pgterms/> 

select ?title ?name_i ?name_j where {
  ?work dcterms:title ?title ;
        dcterms:creator ?creator_i .
  ?creator_i pgterms:name ?name_i .
  optional { 
    ?work dcterms:creator ?creator_j .
    ?creator_j pgterms:name ?name_j .
    filter( ?creator_i != ?creator_j )
  }
}
---------------------------------------------------------------------
| title         | name_i                  | name_j                  |
=====================================================================
| "The Mystery" | "Adams, Samuel Hopkins" | "White, Stewart Edward" |
| "The Mystery" | "White, Stewart Edward" | "Adams, Samuel Hopkins" |
---------------------------------------------------------------------

Cleaning up the query

The query above is enough to get you going, but you can actually make it a bit more concise.

Blank Nodes

Since you're not projecting the value of ?creator_i and ?creator_j, you can actually use a blank node here; instead of writing:

?work dcterms:title ?title ;
      dcterms:creator ?creator_i .
?creator pgterms:name ?name_i .

you can write

?work dcterms:title ?title ;
      dcterms:creator [ pgterms:name ?name_i ] .

Property Paths

And since you're only concerned with one property of the creator, you can make this even shorter with a property path:

?work dcterms:title ?title ;
      dcterms:creator/pgterms:name ?name_i .

Final Result

After doing that, you'd have this query and result:

prefix dcterms: <http://purl.org/dc/terms/> 
prefix pgterms: <http://www.gutenberg.org/2009/pgterms/> 

select ?title ?name_i ?name_j where {
  ?work dcterms:title ?title ;
        dcterms:creator/pgterms:name ?name_i .
  optional { 
    ?work dcterms:creator/pgterms:name ?name_j .
    filter( ?name_i != ?name_j )
  }
}
---------------------------------------------------------------------
| title         | name_i                  | name_j                  |
=====================================================================
| "The Mystery" | "Adams, Samuel Hopkins" | "White, Stewart Edward" |
| "The Mystery" | "White, Stewart Edward" | "Adams, Samuel Hopkins" |
---------------------------------------------------------------------
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top