Domanda

It's been a long time since I haven't created a project from scratch, and now document-oriented databases (as well as ODM) have become quite popular, so I have to consider them before blindly going the relational route.

Could anyone try to list the motivations / project criteria that could lead to one choice or another?

È stato utile?

Soluzione

ORM / relational database / SQL

Pros:

  • well-understood, standard approach
  • maps well to data with consistent structure
  • maps well to data with multiple relationships between multiple entities
  • has extensive join capabilities
  • has transactions
  • scalable to huge number of transactions per second (using MySQL Cluster, Fusion-IO, etc)

Cons:

  • hard to scale to huge volume of data, if performance is also a concern
  • does not map well to data with variable structure (or semi-structured)
  • persisting objects requires a glue/translation layer which can be a performance bottleneck (and if done wrong also can be very verbose)

ODM / document database / NoSQL

Pros:

  • scalable to huge volume of data, and to huge numbers of relatively independent queries
  • high availability, sharding, multi-master, ...
  • maps well to semi-structured data
  • maps well to data with more variable structure
  • data model can be more flexible
  • queries do not have to be translated to SQL (the native NoSQL query style may or may not be better suited for some uses, and no overhead from SQL driver/parsing/etc)
  • (for object databases) maps directly to objects, no need for object-relational translation

Cons:

  • often, no join (or limited version of join)
  • often, no transactions (or a limited version of transactional consistency/atomicity)

How to decide

Based on the type of data and usage patterns:

  • Does the data have a uniform structure? (relational) ... or a variable/inconsistent structure? (document)
  • Does typical usage read/write a single type of entity? (document) ... or a view which is composed of properties of multiple entities? (relational)
  • Are transactions required? (relational) ... or no transactions needed? (document)

Based on the scaling/performance requirements:

  • Huge data + few, slow, complex read/writes? (data warehousing type scenario) => relational
  • Huge data + huge volume of simple read/writes? (craigslist backend type scenario) => document
  • Huge data + fast, complex reads/writes? => this is difficult; either use relational and try to scale it up, or use document and try to simplify queries
  • Moderate data + fast transactional writes? (banking type scenario) => relational
  • Moderate data + moderate read/write? => pick any based on vendor/tool support, familiarity, etc

References

(Background: I haven't done anything in this area very recently, but a few years ago I built a large system which used replicated MySQL + Sphinx, ie a relational and document hybrid)

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top