Вопрос

I'd like to test how good Virtuoso is to process distributed querying.

For example, I have a large RDF graph (more than 100GB) and I want to use clusters to query this graph.

Can Virtuoso split the graph on small graphs for using them on clusters or should I split the graph and congregate query results manually? In other words, is it possible to use Virtuoso for distributed querying? If it is possible, where can I find a guide for this?

Thank in advance.

Это было полезно?

Решение

Someone asked an ominously similar question on the OpenLink Support forums a few days ago are you the same person ?

What is the reason for wanting to split this large RDF graph (more than 100GB), how much does that equate to in terms of triples ?

There is a Virtuoso Clustered Edition available in commercial form only enabling multiple Virtuoso instances spread across multiple physical instances and/or machines to pool there resources for processing large volumes of data RDF or other ie SQL etc. That way you don't have to physically split graphs you simply load the data into the clustered instance and it will be automatically partitioned for you and you query as if a single Virtuos instance, with good locality which is the key to performance.

Virtuoso also supports the standard SPARQL-FED syntax for distributed query execution as detailed on the W3C web site, using the "service" clause to perform the remote execution and return the result via your local Virtuoso instance. Thus a sample query query executing a remote query against the DBpedia SPARQL endpoint from a local Virtuoso instance would be:

SELECT * WHERE { SERVICE http://dbpedia.org/sparql {
SELECT * WHERE
{
?s ?p ?o . FILTER (?s = http://dbpedia.org/resource/Nevis ) } LIMIT 100
}
}

Thus the data could be split across multiple single server instance (open source or commercial or other with sparql-fed support) and queried, but you would have to split the graph yourself manually and the performance of SPARQL-FED generally it not very good as you loose locality and the internal optimisations of a "true" clustered server solution ...

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top