Question

In this presentation there was a chart that showed the following horizontal scalability ceiling as data gets larger:

key-value > column family > document database > graph database

http://youtu.be/UodTzseLh04?t=13m36s

In other words, as data gets more connected (i.e. complex) the limit on which you can let the database grow gets lower.

Why is data size not as scalable for document databases compared to key-value stores? Have I answered my own question by saying "the more freedom in connecting data, the harder it is to partition data"?

(The "what I'm trying to do" part which everyone usually asks: I have a database with a schema that is MOSTLY tree-like but occasionally has nodes with 2 parents. I used Neo4j in my prototype but for a production-scale app I'd need to think more about partitioning. I'm going to have to use Mongo DB since Graph Databases cannot easily be partitioned, and it will be harder to write code for my "multiple parents" relationships in Mongo DB. So I'm wondering if it's worth going the extra mile and use key-value stores - or at least a column family store).

Was it helpful?

Solution

For graph databases ... I would consider looking at Titan for scalability. https://github.com/thinkaurelius/titan.

They wrote a good blog post recently about how their database engine stores data for scaling/performance: http://thinkaurelius.com/2013/11/01/a-letter-regarding-native-graph-databases/

Titan also can be configured to work hand in hand with Cassandra, so you get the benefit of a columnar database as well.

I think you hit the nail on the head with your understanding of relationships (one entity relating to another) and scalability.

The more "joins" or "connections" you have to manage, the harder it is to scale.

Key/value systems assume you will relate data in your application. There are no concepts of queries, so to scale, you can shard based on the key. Pretty easy and very scalable.

If you read some of the articles about Titan it's easy to see why it's hard to scale something like a graph database.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top