Question

I am currently investigating Virtuoso, and I would really like to know what the differences are between the Native RDF Quad Store and the SQL Based RDF Triple Store as shown on this page (scroll down a bit to see the figure): http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtJenaProvider

I know that the Native RDF Quad Store use a traditional relational datbaase under the hood, but that it is optimized for faster requests with SPARQL. That confuses me! Because I wonder what the SQL Based RDF Triple store is now...

Thanks in advance!

Was it helpful?

Solution

Virtuoso is in fact a whole suite of applications and service layers built on top of their own SQL database hence the understandable confusion.

The Native RDF Quad Store is Virtuoso's own quad store implementation which ironically enough as you point out is actually SQL based. This is stored and implemented entirely within Virtuoso's own SQL database implementation. Thus although it is SQL based it has a fixed table layout and makes use of custom data types to store the data.

The SQL Based RDF Triple Store refers to a feature of commercial versions of Virtuoso which allows you to define mapping rules to treat arbitrary normal relational databases (both Virtuoso based and other backend based e.g. MySql, PostgreSQL) as an RDF store.

Performance Differences

The performance difference comes from the fact that the Native Quad store has a known layout, custom RDF data types and lots of software optimizations specific to it within Virtuoso's stack. Therefore when Virtuoso takes in SPARQL and compiles it to an equivalent SQL query it runs extremely efficiently on their database. The use of custom RDF data types allows them to push all the SPARQL logic down into the query engine layer which also makes evaluation faster.

For the SQL Based Triple store there is a mapping layer involved, they have to call out to the SQL database (which may be external) and translate its content into RDF form in order to then do the computations necessary to answer the SPARQL queries. The mapping step can be extremely costly and it makes queries harder to optimize because they have access to less up front information about the RDF data.

Plus since the data is often just in standard SQL types they can't push down certain logic to the underlying query engine because the SQL and SPARQL type semantics don't align in many cases. Therefore the values have to be extracted, converted appropriately and then expression results computed above the query engine layer and then fed back in as necessary. This reduces performance because the engine has to switch between different processing contexts and potentially make many SQL queries to answer the same SPARQL query.

OTHER TIPS

Virtuoso is a hyrbid (Relational Tables and RDF Property Graphs) data server [1] implemented using a SQL RDBMS core. SQL RDBMS relations are only external if the data sources in question are external to Virtuoso e.g., when ODBC or JDBC are used to attach Tables in external RDBMS databases to Virtuoso, as part of its Virtual Database functionality [2].

[1] http://virtuoso.openlinksw.com/images/virtuoso3arch.gif -- Virtuoso technical architecture diagram .

[2] http://docs.openlinksw.com/virtuoso/vdbengine.html -- Virtual Database Engine

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top