Question

I like it (very much) that is supports SPARQL/Update and the SPARQL endpoint that comes with it, but

  • I'm a little worried about vendor lock in
  • I think it is overkill for my requirements (I want a graph store with half a billion triples)
  • I would love to use an open-source and free product instead

So far I couldn't find any descent and comparable products (commercial or otherwise). They pretty much look immature or experimental to me. Ideas ?

Was it helpful?

Solution

What you might be looking for is http://4store.org/ and you might also try searching for questions very like this over on http://www.semanticoverflow.com/ (link is defunct)

OTHER TIPS

Two others besides 4store that @dajobe has already mentioned are Dydra and the Talis platform. Vendor lock-in should not, in general, be a problem if you stick to language features specified in the SPARQL standards.

Having used a lot of different Triple Stores as storage layers in my research project I would recommend the following two:

  • 4store - Already mentioned by dajobe and is very good and has frequent releases to fix bugs and add new features as SPARQL 1.1 continues to be standardised. Also has benefit of being totally free
  • AllegroGraph - Free for up to 50 million Triples though tends to be be quite a RAM hog even at relatively low numbers of Triples (e.g. used around 3 of my 4GB of RAM when I had about 1.5m triples). Actual memory usage will vary with usage - in my case I was running an app that meant my entire dataset had to be loaded into memory. I haven't used Version 4 so I can't say whether they have improved this

While Virtuoso is very good at some things it has a seriously bad case of feature creep and has a lot of non-standard/proprietary features which like you imply might lead to vendor lock in.

Like Ian says stick to using the core language features in the SPARQL Standards and then you can easily move to a different Triple Store as your needs change. When developing your application try and design it to be storage agnostic so you can just plug in a different storage layer as your need to. How easy this is to do will depend on your programming environment/language/API but doing it will be beneficial in the long run.

We have positive experience with Bigdata. 4Store (as mentioned above) is also good, but does not have support for transactions.

  • I'm a little worried about vendor lock in

OpenLink Software (my employer) works very hard to implement open standards and specifications where they exist and are sufficient. We add extensions, and document that we've done so, when necessary -- as with the aggregate and other analytics functions which were not part of SPARQL 1.0, but are part of SPARQL 1.1 and/or will be part of SPARQL 2.0.

If you stick with the published standards, you won't be locked in. If you need the extensions, we think we're not so much locking you in as enabling and empowering you... but your mileage may vary.

  • I think it is overkill for my requirements (I want a graph store with half a billion triples)

By all means, consider all the functionality you need when making your decision. But it seems likely to me that you'll be doing more than storing your triples. Queries, reasoning, query optimization, Federated SPARQL (joins against other remote SPARQL endpoints, formerly known as SPARQL-FED), and other functionality may not be so much overkill as simply not-yet-needed.

It's worth noting that Virtuoso can be run in a minimized form (LiteMode=1) which disables many of the features perceived as "overkill" and makes it much more like an embedded DBMS -- but still hybrid at the core. When Lite mode is on:

  • Web services are not initialized, i.e., no web server, DAV, SOAP, POP3, etc.
  • replication is stopped
  • PL debugging is disabled
  • plugins are disabled
  • Bonjour/Rendezvous is disabled
  • tables relevant to the above are not created
  • index tree maps is set to 8 if no other setting is given
  • memory reserve is not allocated
  • DisableTcpSocket setting is treated as 1, regardless of value in INI file
  • I would love to use an open-source and free product instead

Virtuoso has two flavors -- commercial (VCE), and open source (VOS). Commercial includes shared-nothing elastic clustering which brings linear scalability, SPARQL GEO indexing and querying, result transformation to CXML for exploration with PivotViewer, and other features which VOS lacks ... but use the one that makes sense to you.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top