Frage

I'm reading about ArangoDB and it is more interesting but I can't find where in the documentation how ArangoDB scales. Does ArangoDB scale and can it use sharding like MongoDB or CouchDB?

War es hilfreich?

Lösung

As I understand, it does not allow sharding (prior to version 2.0), but replication. From the link

AvocadoDB effortlessly permits replications. We like the “zero-admin principle”. Making replications with AvocadoDB is really easy: Insert IP address and go!

Following replication types are intended for version 2:

  • master-master synchronous,
  • master-master asynchronous,
  • master-slave synchronous,
  • master-slave asynchronous

Andere Tipps

EDIT

ArangoDB supports sharding since Version 2.0.

Version 3.0 will bring VelocyPack, which is a binary JSON representation optimized for compactness, parseability and composeability. It supersedes the shape concept / shaped JSON.

/EDIT


I am the chief architect of ArangoDB.

​monkegjinni is right, ArangoDB did not support sharding, but replication. Why?

​Short version:

Offering a support for fairly complex data models like graphs and documents gets into conflicts with how sharding works. However, with the efficiency of modern SSD and computers, we believe that almost all projects no longer need sharding. Today's computer will easily store all data on a single nodes. What these projects need is replication for load distribution which is supported by ArangoDB.

Long version:

There are actually to separate scaling issues.

The first issue is distributing the request over several servers to balance the request load.

ArangoDB will support this through synchronous replication of writes and distribution of the read requests.

Note that most database systems follow a very similar path, i.e., they support distributing the requests either with restricted consistency guarantees or they allow writes only on one node and distribute the read requests. They have this restriction because distributing write requests and supporting full consistency is impossible to do efficiently. And doing it inefficiently will negate the gain that we wanted to achieve through distribution.

​The second issue is distributing the data over several servers to allow larger datasets.

ArangoDB does not support distributing the data over several servers.

​We have made this decision, because distributing the data over several servers always comes at a price.

​This price can be very explicit. For example it can be that the data model is very limited. This is the route that key value stores such as Dynamo or RIAK have taken. Here the data model and the supported queries are so simple, that it is always possible to direct a query to the server (or the small number of servers) on which the requested value live.

​Note that we do believe that this approach is valid for some applications (e.g. Amazons database). But we believe that the number of applications that truly need to store so much data that they must distribute it over a large number of servers and must therefore restrict the access pattern to key-value is very small.

​Or the price can be hidden. This is for example the case if the data is distributed and the database system allows general queries. In that case the query must be distributed over all servers (because the data you are looking for may live on any of the servers). That makes the queries inefficient.

​The ArangoDB approach is rather to squeeze the most onto one server (well ArangoDB supports multiple servers - but to support availability). For this it uses two main strategies.

​One strategy is to make use of SSDs. Note that the capacity of SSDs is growing larger at an incredible rate (you can buy a Terabyte of SSD for by far less money that a second server would cost you). And endurance (the total amount of data that can be written to a SSD) goes up to Petabytes (now that vendors finally get the wear leveling algorithms right) - so reliability of SSDs is no longer an issue. And the performance of those SSDs is very nice (closer to main memory than to ordinary disks).

​The other strategy is to store the data efficiently. ArangoDB uses shapes to store documents: A shape is the information which attributes and attribute types a document has - all document with the same shape share the representation of this information. This means that documents can be stored in less space than the JSON or BSON representation would require.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top