سؤال

Whats the cons and pros of using a object database or relational database for regular web development which involves a lot of CRUD?

UPDATE: I reopened the bounty reward in order to give Neville it.

هل كانت مفيدة؟

المحلول

Relational database:

Pros:

  • Established technology - lots of tools, developers, resources
  • Wide range of Open Source and commercial products
  • Known to scale to very large sites, and very high throughput
  • Expresses many problem domains in a logical and "programmable" way
  • Fairly standard language (SQL)

Cons:

  • Impedance mismatch with OO concepts - modeling "inheritance" in a database is not natural
  • Hierarchical structures usually require vendor-specific extensions to the language
  • Non-relational data (e.g. documents) are not a natural fit
  • Changes in the business domain can be hard to implement once the schema has been defined

OOBDMS

Pros:

  • Closer fit for OO concepts
  • In theory, a developer only needs to work in one language - the persistence details are abstracted away. This should improve productivity

Cons:

  • Significantly fewer tools/resources/developers available.
  • No widely accepted standards
  • "black box" approach to persistence can make performance tuning difficult
  • persistence details often leak into the OO design (see Marcelo's examples)

نصائح أخرى

The concept of an OODBMS is quite broken, and the various commercial and free offerings that have emerged over the last few decades have barely made a dint in the marketplace.

The relational model is more powerful than object models in terms of the kinds of questions you can ask of your data. Unfortunately, SQL threw out much of the expressive power that the relational model is capable of, but even in this diluted form, it is still easier to express queries in SQL than in a typical OO database (be it ORM or OODBMS).

Queries in an OODBMS are predominantly driven by navigational operators, which means that if your sales database has sales people owning their sales, then querying for the monthly sales for a given SKU is not only likely to be cripplingly slow, but very awkward to express. Consider also a security model that grants employees access to buildings. Which is the correct way to express this? Should employees hold a collection of buildings they can access, or should buildings hold a collection of employees that have access to them? More to the point, why should either class have to have a collection of the other baked into its design? And, whichever one you choose, how would you ask which pairs of employees have more than one building they can access in common? There is no straightforward navigational pattern that can answer such a question. The sensible solution — an "Access" object — is essentially a reversion back to a properly normalised relational schema, and it requires some kind of query language that borrows heavily from the relational algebra in order to answer the question without a massive over-the-wire data transfer.

Also consider another major strength touted for the OODBMS: methods, especially inheritance with virtual methods. A sports clinic might have different risk-of-injury metrics for different kinds of athlete. In the ORM world, this would be automatically expressed as a class hierarchy, with Athlete at the root, and a virtual method, int InjuryRiskScore() implemented by each derived class. The problem is that this method is invariably implemented on the client, not at the back end, so if you want to find the 10 highest risk athletes across all sports at your clinic, the only way to do it is to fetch all athletes across the wire and pass them through a client-side priority queue. I don't know the OODBMS world as well, but I think the same problem occurs, since the storage engines generally only store enough data to rehydrate objects in the client's programming language. In the relational model or SQL, you would express risk-of-injury scoring as a view, which could be simply the union of per-athlete-type views. Then, you just ask the question. Or you can ask more complicated questions like, "Who had the greatest increase in their risk-of-injury since last month's checkup?" or even, "Which risk score has proven to be the best predictor of injury over the last year?". Most importantly, these questions can all be answered inside the DBMS with nothing more than the question and the answer travelling across the wire.

The relational model allows the DBMS to express knowledge in a highly distilled manner based on predicate logic, which allows the various dimensions of the facts you store therein to be joined, projected, filtered, grouped, summarised, and otherwise rearranged in a completely ad hoc manner. It allows you to easily cook up the data in ways that weren't anticipated when the system was originally designed. The relational model thus permits the purest expression of knowledge that we know of. In short, the relational model holds pure facts — nothing more, nothing less (and certainly not objects, or proxies thereof).


On a historical note, the relational model emerged in response to a disastrous state of affairs with the existing network and hierarchical DBMSs of the time, and largely (and rightly) displaced them for all but a small niche of application areas (and even these probably remained largely because SQL failed to deliver on the RMs power). It is deeply ironic that much of the industry is now essentially yearning for the "good old days" of network-theoretical databases, which is essentially what OODBMSs and the current crop of NoSQL databases are going back to. These efforts rightly criticise SQL for its failure to deliver on today's needs, but unfortunately they assumed (wrongly, and probably out of pure ignorance) that SQL is a high-fidelity expression of the relational model. Hence they neglected to even consider the relational model itself, which has virtually none of the limitations that has driven so many away from SQL, often towards OODBMSs.

I can answer your question with respect to one Object database I know well: ZODB.

The ZODB allows you to persist your data models almost completely transparently. Its usage amounts to something like:

 # 'Persistent' allows us to save things transparently
 class Car(Persistent):
   def set_wheels(number)
      self.wheels = number

 # 'database' is a ZODB database
 database.car = Car()
 database.car.set_wheels(3)

You'll have to look a long time to find that kind of readability with an RDMBS. That there is the big pro to using ZODB in a web application.

The big downside, as Marcello outlines, is lack of powerful querying. That's partly a side-effect of the convenience of the idiom above. The following is completely OK, and everything will get persisted to the database:

 database.car.colour = "yellow"
 database.car.owner = "Trotter"
 database.car.neighbour = Car()

However, this kind of flexibility makes it hard to optimize complex queries across different models. Just making list of all yellow cars with neighbours will require O(n) time unless you roll your own index.

So, it depends what you mean by "regular web development". Many websites don't actually require complex multi-dimensional queries and searches in linear time are no problem at all. In those cases, using an RDBMS can in my opinion over-complicate your code. I've written many CMS-type applications using solely an object database. Lots of CRUD doesn't particularly come into it; ZODB is very mature, and scales and caches pretty well.

However, if you're writing a web application that needs to do complex business reporting along the lines of Google Analytics, or some kind of warehouse inventory management system with many terabytes of data, then you're pretty definitely going to want an RDBMS.

To summarise, an object database can give you readability and maintainability at the cost of complex query performance. Of course, readability is a matter of opinion, and you can't ignore the fact that very many more developers know SQL than the various object database dialects.

In regular web development I use Seaside on Gemstone. For most applications, that means I write zero database connection code. It performs, it scales, development is about five times faster.

The only time I will ever use a relational database again for web development is when I have to connect to an existing one.

The advantages:

  • less code, faster development;
  • much better scalability;
  • can handle much more complex models;
  • much better project agility;
  • did I mention flexibility;
  • can handle changes in class models, not just data but also code;

The disdadvantages:

  • you'll probably have to train developers yourself;
  • the one you want (gemstone) costs serious money for large systems

Relational db

  • SQL and standards
  • easy to model
  • can use only standard and vendor types
  • referential integrity (matematically solid relational set theory)
  • lot of tools and database implementations
  • data separate from program
  • storage management and high end infrastructure support
  • transaction and concurrency management done within
  • Relational Model is Value-Based ie rows are identified by primary keys

Cons

  • no custom type
  • no extensible data types
  • impedance mismatch
  • cannot express nested relationship
  • cannot use complex entities as a single unit
  • need to define keys and various types of relationships at the data model level
  • write procedures for versioning, transactions if needed

Object DB

  • High performance
  • Faster as no joins required
  • Inherent versioning mechanism
  • Navigational interface for operations (like graph traversal)
  • Object Query Language retrieve objects declaratively
  • complex data types
  • object identity ie. equals() in which object identity is independent of value and updates
  • facilitates object sharing
  • classes and hierarchies (inheritance and encapsulation)
  • support for relationships
  • integrated with a persistence language like ODL
  • support for atomicity
  • support for nested relationships
  • semantic modelling

Cons

  • No mathematical foundation as RDB (refer Codd)
  • cons of object orientation
  • persistence difficult for complex structures, some data must be transient

Object-Relational databases (You might have seen UDTs!)

  • support for complex data types like collection, multisets etc
  • object oriented data modelling
  • extended SQL and rich types
  • support for UDT inhertance
  • powerful query language

Different approaches (OO, Relational DB or OODB) may be necessary for different applications

References

The advantage of using relational databases for large corpora

Relational Database Relational Database

OODMS manifesto

ODMG

Benefits of a relational database

The Object-Oriented Database System Manifesto

Object Oriented Database Systems

Object Relational Databases in DBMS

Completeness Criteria for Object-Relational Database Systems

Comparisons

http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems

http://en.wikipedia.org/wiki/Comparison_of_object_database_management_systems

http://en.wikipedia.org/wiki/Comparison_of_object-relational_database_management_systems

I think that everything depends on the specifics of your issue. (I'm really going out on a limb here, I know.)

All we know is that you want to use the DB for web development, and you'll be doing a lot of operations on the data.

One of the relevant questions to ask yourself is, how important is it that the DB be tightly integrated with the objects you manipulate? The more it's necessary, the more an object-oriented DB recommends itself.

On the other hand, if your data easily lends itself to the relational model, a relational DB might be better.

Think about the operations you'll need to do. Will you need analysis of all sorts of items with different attributes? How much will you need to future-proof your DB?

I should add that if your DB is likely to be fairly small, performance will not be a major issue. But if performance is, in fact, an issue, you have lots of things to worry about beyond just OO vs. relational DBs. (Just to pick one example from the relational DB world, what normalization form should you use? This is an exceedingly important, and complex, question. Are you maintaining an operational system or data warehouse? Do you know ahead of time that certain queries are of paramount importance, or of negligible importance? &c.)

Beyond the question of DB performance and integration with your object model, there are other real-world questions to ask. Do you have diskspace / server / bandwidth limitations? Will you offer only a small number of operations to web users, or might people you don't even know be creating their own queries/edits?

For other, more important, real-world questions, whom will you be working with? What do they already know (or prefer)? And if you don't have domain knowledge yet, maybe you have personal curiosity pushing you in one direction? If you're starting on a personal project, following your own preferences is a better guide to success than worrying over performance before you even start.

If you can answer these and similar questions, even if the answer is "I don't know," you will be able to get much better direction in how to proceed.

In contract to Marcelo's in depth and well thought out response, I'd say that based on the phrasing of your question "regular web development", my off the cuff response would be to say that you'd be hard pressed to find enough pro's to justify using a Object DB over a traditional relational db, for the simple fact that are more resources/developers/tutorials/etc that are more familiar with the traditional relational model, and how to utilize that to achieve "regular web development".

That said, I think that with some of the modern ORMs you get a little of the best of both worlds, in that your underlying data is stored in a well-understood RDBMS (that is likely stable, supported, etc), but you can still abstract away some of the Object modeling capabilities that can (arguably) be more suited to developing CRUD applications.

I'll admit that I'm not well versed in the current capabilities of modern OODBMSs, however unless you are in a field that is completely suited to achieving a perfect object representation of your domain (and you have the object modeling talent to take advantage), then I'd stick with a RDBMS for your persistent storage.

Hope that helps!

This pretty much explains the pros and cons:

http://en.wikipedia.org/wiki/Object_database

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top