When is running a graph database like OrientDB in cluster mode required?

https://dba.stackexchange.com/questions/256766

21-02-2021
|

Pergunta

OrientDB can run in a distributed environment, supports ACID transactions and peer to peer replicaiton.

According to the book "NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence" by M. Fowler and P. Sadalage, graph databases work best in a single server configuration (chapter 4.1).

What is the reason for this statement and when should I use a clustered version for a graph database?

Solução

You ask two different questions here:

[W]hen should I use a clustered version for a graph database?

This one is easy: as soon as your scalability or availability requirements cannot be met by One Giant Server.

[G]raph databases work best in a single server configuration[...] What is the reason for this statement?

Looking at the full book quote you can see a hint at the reason (emphasis mine):

Although a lot of NoSQL databases are designed around the idea of running on a cluster, it can make sense to use NoSQL with a single-server distribution model if the data model of the NoSQL store is more suited to the application. Graph databases are the obvious category here—these work best in a single-server configuration.

As the book to which you are referring explains, there are two non-mutually-exclusive approaches to distributed databases: sharding (partitioning) and replication.

Sharding works well when all entities in a shard are tightly coupled together and simultaneously loosely, or not at all, coupled with entities in other shards. This is rarely the case with a graph model, where each node can potentially be linked to all other nodes¹.

Replication works well when updates to your data are small in scope (e.g. a small number of entity instances per unit of work). Again, in a graph model a change of one node can potentially affect the entire graph, or a large portion of it, which in an ACID-compliant database causes almost exponential growth of the transaction volume, since you must update at least the majority of replicas before the transaction can be considered durable.

Things become even more complex with peer-to-peer, that is bidirectional, replication, where you have to deal with conflict resolution on top of that.

As a result, distributing a graph database (where a true graph model is justified) does not provide much benefit from the scalability perspective. Partitioning a graph database doesn't improve availability either.

1 - If it is not, then you need multiple graph databases, or even a relational database instead.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a dba.stackexchange