Question

If the relationships between the data are as important as the data itself (such as distance or path calculations), then don't use a column family/big table database.

(Quoted from article Big data woes: Which database should I use? by Andrew Oliver)

Could someone elaborate on what Andrew meant by this? It is not entirely evident to me.

Was it helpful?

Solution

Big data usually means that databases are distributed on multiple servers. Table-based database usually have severe scaling problems when you need to join entries with each other which are on different servers. That makes them unsuitable for use-cases which focus on connections between database entries. Their query languages are also often not very well-equipped for analyzing connections.

In that case you should consider using a graph database like Neo4j. The pros and cons of graph databases are described further below in the article you quoted.

Common uses for graph databases include geospatial problems, recommendation engines, network/cloud analysis, and bioinformatics -- basically, anywhere that the relationship between the data is just as important as the data itself.

Graph databases make it easy to follow relationships between database entries. They make it easy to query for things like the friends of the friends of a user or all users with common interests.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top