Question

What is the pros and cons of MongoDB (document-based), HBase (column-based) and Neo4j (objects graph)?

I'm particularly interested to know some of the typical use cases for each one.

What are good examples of problems that graphs can solve better than the alternative?

Maybe any Slideshare or Scribd worthy presentation?

Was it helpful?

Solution

MongoDB

Scalability: Highly available and consistent but sucks at relations and many distributed writes. It's primary benefit is storing and indexing schemaless documents. Document size is capped at 4mb and indexing only makes sense for limited depth. See http://www.paperplanes.de/2010/2/25/notes_on_mongodb.html

Best suited for: Tree structures with limited depth

Use Cases: Diverse Type Hierarchies, Biological Systematics, Library Catalogs

Neo4j

Scalability: Highly available but not distributed. Powerful traversal framework for high-speed traversals in the node space. Limited to graphs around several billion nodes/relationships. See http://highscalability.com/neo4j-graph-database-kicks-buttox

Best suited for: Deep graphs with unlimited depth and cyclical, weighted connections

Use Cases: Social Networks, Topological analysis, Semantic Web Data, Inferencing

HBase

Scalability: Reliable, consistent storage in the petabytes and beyond. Supports very large numbers of objects with a limited set of sparse attributes. Works in tandem with Hadoop for large data processing jobs. http://www.ibm.com/developerworks/opensource/library/os-hbase/index.html

Best suited for: directed, acyclic graphs

Use Cases: Log analysis, Semantic Web Data, Machine Learning

OTHER TIPS

I know this might seem like an odd place to point to but, Heroku has recently gone nuts with their noSQL offerings and have an OK overview of many of the current projects. It is in no way a Slideshare press but it will help you start the comparison process:

http://blog.heroku.com/archives/2010/7/20/nosql/?utm_medium=email&utm_source=EmailBlast&utm_content=619506254&utm_campaign=HerokuSeptemberNewsletter-VersionB&utm_term=NoSQLHerokuandYou

Checkout this for at glance comparison of NoSQL dbs:

http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

You could also evaluate a Multi-Model DBMS, as the second generation of NoSQL product. With a Multi-Model you don't have all the compromises on choosing just one model, but rather more than one model.

The first multi-model NoSQL is OrientDB.

MongoDB:

MongoDB is document database unlike Relational database. The document stores semi structured data like JSON object ( schema free)

Key features:

  1. Schema can change over evolution of application
  2. Full indexing
  3. Load balancing & Data sharding
  4. Data replication
  5. Consistency & Partitioning in CAP theory ( Consistency-Availability-Partitioning)

When to use:

  1. Real time analytics
  2. High speed logging
  3. Semi structured data management

When not to use:

  1. Highly transactional applications with strong ACID properties ( Atomicity, Consistency, Isolation & Durability). RDBMS is preferred in this use case.
  2. Operating on data sets involving relations - foreign keys etc

HBASE:

HBase is an open source, non-relational, distributed column family database

Key features:

  1. It provides a fault-tolerant way of storing large quantities of sparse data (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection)
  2. Supports variable schema where each row is different
  3. Can serve as the input and output for MapReduce job
  4. Compression, in-memory operation, and Bloom filters on a per-column (A data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set) 5.Achieve CP on CAP

When to use HBase:

  1. If you’re loading data by key, searching data by key (or range), serving data by key, querying data by key
  2. Storing data by row that doesn’t conform well to a schema (variable schema)

When not to use HBase:

  1. For relational analytics
  2. Full table scans
  3. Data to be aggregated, analyzed by rows instead of columns

Neo4j:

Neo4j is graph database using Property Graph Data Model (Data is stored as a graph and nodes & relationships with properties)

Key features:

  1. Supports full ACID(Atomicity, Consistency, Isolation and Durability) rules
  2. Supports Indexes by using Apache Lucence
  3. Schema free, bottom-up data model design
  4. High scalability has been achieved due to compact storage and memory caching available for graphs

When to use:

  1. Master data management
  2. Network and IT Operations
  3. Real time recommendations
  4. Fraud detection
  5. Social network (like facebook)

When not to use:

  1. Bulk queries/Scans
  2. If your application requires Partitioning & Sharding of data

Have a look at comparison of various NoSQL technologies in this article

Sources:

Wiki, Slide share, Cloudera,Tutorials Point,Neo4j

Pretty decent article here on MongoDB and NoRM (.net extensions for MongoDB) http://lukencode.com/2010/07/09/getting-started-with-mongodb-and-norm/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top