Distributed querying in Virtuoso

Question

Someone asked an ominously similar question on the OpenLink Support forums a few days ago are you the same person ?

What is the reason for wanting to split this large RDF graph (more than 100GB), how much does that equate to in terms of triples ?

There is a Virtuoso Clustered Edition available in commercial form only enabling multiple Virtuoso instances spread across multiple physical instances and/or machines to pool there resources for processing large volumes of data RDF or other ie SQL etc. That way you don't have to physically split graphs you simply load the data into the clustered instance and it will be automatically partitioned for you and you query as if a single Virtuos instance, with good locality which is the key to performance.

Virtuoso also supports the standard SPARQL-FED syntax for distributed query execution as detailed on the W3C web site, using the "service" clause to perform the remote execution and return the result via your local Virtuoso instance. Thus a sample query query executing a remote query against the DBpedia SPARQL endpoint from a local Virtuoso instance would be:

SELECT * WHERE { SERVICE http://dbpedia.org/sparql {
SELECT * WHERE
{
?s ?p ?o . FILTER (?s = http://dbpedia.org/resource/Nevis ) } LIMIT 100
}
}

Thus the data could be split across multiple single server instance (open source or commercial or other with sparql-fed support) and queried, but you would have to split the graph yourself manually and the performance of SPARQL-FED generally it not very good as you loose locality and the internal optimisations of a "true" clustered server solution ...