Cross-platform method for loading large RDF into triple stores

https://stackoverflow.com/questions/13344801

28-11-2021
|

Вопрос

Currently, we're using Virtuoso to store RDF triples. We want an automated way to load RDF files to the database. The data can be very large, so currently we are relying on the Virtuoso bulk data loader to load the data; however, it's possible that in the future we will switch to some other triple store, so I don't want to be dependent on a platform-specific solution like this. Is there a more general, cross-platform way of loading large RDF files into triple stores?

Most of our programming is done in Python, so a solution with Python bindings would be preferable.

I'm pretty new with semantic web technologies so please let me know if my question isn't detailed enough and I'll try to provide more information. Thank you!

Решение

There are any number of Virtuoso RDF insert methods detailed at http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtRDFInsert , most of which are specific to Virtuoso partly due to features being unique to Virtuoso like WebDAV & ODS or features implemented differently in other stores or not at all.

Probably the most generic method in your case would be to read the datasets into Python and use SPARQL 1.1 update (http://www.w3.org/TR/sparql11-update/) commands to insert/load the data into Virtuoso or any other triple stores supporting SPARQL 1.1 update, which I imagine most would do now. The main drawback of this approach is that the insert process has to be managed in Python to ensure the data is being loaded consistently, handling deadlocks, rollbacks etc. which would make this method much slower and probably intolerably so for very large datasets. That is why most vendors provide their own "bulk loader" methods where data consistency and deadlocks etc. are handled internally much faster.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow