Question

How to scale up Sesame? I'm planning to store a lot of triples in my Sesame and I'm wondering what I should do in order to have a scalable solution.

Ideally I would like my (native) store distribuited among several sesame instances, so a first question is: is there a way to "shard" sesame? If so, could you please point me to some kind of documentation?

In case of using a relational store, should I rely on a relational backend store?

In general, other than hardware resources and front-end load-balancers, what kind of support Sesame provides for medium / big data scenarios?

Was it helpful?

Solution

There are several ways to scale up. I won't give you a complete overview of all possibilities here but give you a few pointers instead.

A single Sesame native store scales to about 100-150 million triples on typical hardware. Beyond that, you can either use a third-party Sesame-compatible store such as USeekM, Bigdata, CumulusRDF or OWLIM (which scales well into the billions of triples), or you can use Sesame's own Federation SAIL. The federation members can be any combination of Sesame-compatible stores, including native stores running locally or remote stores accessible over HTTP.

The Federation SAIL distributes write operations using a simple size-dependent sharding algorithm, trying to distribute data over all members equally. Queries are of course automatically distributed and results re-integrated.

OTHER TIPS

Sesame's relational backend is deprecated now. Explanation on their mailing list.

I am not sure but I think that Sesame wouldn't scale well with its native backends. As far as I know, people tend to use for example OWLIM. You would perhaps need OWLIM-Enterprise (previously BigOWLIM Replication Cluster) if you want a cluster solution.

If Sesame is not a hard requirement, then many people use the clustered edition of Virtuoso to store large amounts of triples.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top