Why aren't OODBMS as widespread as RDBMS?

https://stackoverflow.com/questions/1350044

20-09-2019
|

Question

Why are relation databases more common than object-oriented databases?

If the Object Oriented Programming paradigm is so widespread, shouldn't we see lots of OODBMS? Wouldn't they perform better than the RDBMS+OR/M?

Solution

One reason that RDBMS has retained popularity is that it's established technology, well understood and has a standard language (SQL) that multiple vendors support. It also has a few good interfaces like ODBC and JDBC that make it connect with different languages pretty well. A stable API is a strong factor in keeping a technology dominant.

By contrast, there is no clear model for OODBMS, nor is there a standard language, nor is there a standard API. There's not even a de facto standard by having a leading vendor implementation.

The OODBMS concept might perform better than RDBMS+ORM. It depends entirely on the implementation. But it's also true that OODBMS don't solve the same set of problems that RDBMS are good at solving. Some data management tasks are much easier if you have referential integrity and relational headers enforced by the data management solution. These features are absent in the OODBMS model (at least so far).

There's a lot of noise on blogs that relational databases are obsolete, but RDBMS are nevertheless the best general-purpose solution for a great majority of data management tasks.

OTHER TIPS

The biggest problem I've seen is lack of standardization. In the RDBMS world, you can get pretty far with any random database if you know SQL. They basically all implement it, with minor variations. I don't know a single existing RDBMS that doesn't do SQL: you can almost use "RDBMS" and "SQL" interchangeably.

The closest thing for an OODBMS is perhaps OQL, which has been an utter failure.

No database has ever implemented very much of it. I used a pretty nice commercial OODBMS a couple years ago, but (as of 2007 or so, and it was on major version 8 or 9) it didn't even support querying for an object by its name. The manual said simply that this part of OQL they hadn't gotten around to yet. (I'm not sure, but you might have been able to drop down into a native call to do that.)

Most object databases I've seen recently have native language interfaces rather than a query language like OQL. The system I used, for example, supported (only!) Perl and VB, IIRC. Limiting your audience to only a couple languages (or forcing them to write wrappers, as we did) is not the way to win friends.

Because of this, there's no competition, and therefore no easy backup plan. If you put your data in MS-SQL and Microsoft stopped supporting it, you can probably dump your data into Postgres and port your queries, without too much trouble. (It might be a lot of work, if you have a lot of queries, but I don't doubt you could do it. It's a pain, but not technically challenging.) Or Oracle, or MySQL, or many others, both commercial and free.

No such thing exists with an OODBMS: if the one you're using goes belly-up, or they take it in a direction that's not useful to you, or you find it lacks a key feature you need, you can't just dump your data into a competing OODBMS and port your queries. Instead, you're talking about changing a core library and making massive architecture changes. So realistically, you're limited to a commercial OODBMS who you really trust (can you name even one?), or an open-source OODBMS which you trust your team to maintain when things go bad.

If this sounds like FUD, sorry, I didn't intend that. But I've been there, and from a project management perspective I'd hesitate to go back, even though the programming environment can be wonderful. Another way to think of it is: look at how popular functional programming is today, despite what a good idea it is. OODBMS are like that, but worse, since it's not just your code, but your code and your data. I'd gladly start a major project in Erlang today, but I'd still hesitate to use an OODBMS.

OODBMS vendors: to change this, you need to make it easy to leave you for your competitors. You could dig up OQL and actually implement that, or do it at the API level like ODBC, or whatever. Even a standard dump format (using JSON?), and tools for import/export to/from that for several OODBMSs, would be a great start.

Data often lives longer and is more important than program. So even if you start a greenfield development today you have to consider the overall picture. There are more tools, processes and experienced people working with RDBM systems. Think beyond the program, how about capacity planning, data mining, reporting, ETL, integration with other data sources etc. How about your company acquiring another company and thus bringing all their relational data in your program. RDBMS and associated tools are so entrenched, proven and powerful that I don't there is any strategic sense in using anything else. In some small niche maybe but not in general.

Object databases have a very nice niche for problems like representing geometry e.g. CAD systems, where object graphs can be very deep indeed. JOIN performance degrades rapidly for around 7 tables in most relational systems, so deeply self-referential structures in CAD perform better in object databases.

But important applications like financial data lend themselves to a relational representation. The relational model has a firm mathematical basis, and SQL is a successful and popular language. There is little incentive for financial institutions like banks, brokerages, and insurance companies to switch away from RDBMS.

For trival examples OODBs and RDBs may be very different. Especially if you are working with a small enough amount of data that you can trivially read it all into memory at once and write it out all at once. But ultimately OODB's need to save data in a very RDB-like format - they are not so different.

Consider an arbitrary graph of objects as might be used in an application. Each object may be referenced by several other objects. When you save a graph of objects, you don't want to save objects repeatedly each time they are referenced. For one thing, if you had any kind of loop or self-reference your save-object method would go into an infinite loop. But in the general case, it's a waste of space. Instead, any significant data store needs to declare a unique identifier for each object being stored (a key, usually a surrogate key in RDBMS terms). Each other object that references it saves the object type and key, it does not save the whole object repeatedly. So here we have recreated foreign keys in our non RDB object-store.

Next, pretend that we want to store a list of objects (A1, A2, A3...) that are related to another object (B). We already established that we will store keys instead of saving the objects themselves twice. But do you store the keys to objects A1, A2, A3... on object B, or do you store the key to object B on A? If you store them the first way, and you have all the A's you want, you can quickly grab the relevant B's. The second way the reverse is true. But either way is a one-way deal. If you want to query the reverse of what you stored and your objects are stored as XML or JSON, that's a lot of inefficient parsing through most irrelevant information to find the key in each file. Wouldn't it be better to store them in a format where each field was separated, like columns in a table?

In a many-to-many relationship, or a case where you need to find a large number of objects in both directions, this strategy becomes very inefficient. The only performant solution is to make a helper object to store the relationship, with one file for each relationship such that the file consists of the key of A and the key of B so that they can be looked up quickly. We have just reinvented the cross-reference table.

Tables with columns, unique identifiers (keys), cross-reference tables... These are the basic needs for storing objects in a way that they can be retrieved efficiently. Hmm... Does that sound like anything familiar? A Relational Database provides exactly this functionality. Plus, multiple vendors have competed for decades to provide the fastest data storage and retrieval with the best tools for backup, replication, clustering, querying, etc. That's a lot for a new technology to compete with. And ultimately I'm saying that RDBMS are basically a really good solution to the problem of efficient object storage.

This is why something like Hibernate exists - to put an object-oriented interface on an efficient RDBMS storage system. Where you see other kinds of storage really shine are different problem areas:

For any kind of unstructured document storage (blogs, source control, or anything that can't map to rows and columns), various NoSQL databases are ideal
Keeping an easy-to-query yet meaningful history of changes (like diffs in source control) is not really pretty in RDBs. Something like Datomic may be forging new territory here.
Any time your object graph is simple or small, the overhead of a database may not be necessary.

OODBs cannot perform better than RDBs because they are not fundamentally different.
RDBs are here to stay because saving large graphs of objects in a way that is space-efficient and time-efficient for both saving and retrieval, and also fault tolerant and has some guarantee of data integrity is the problem that RDBs were designed to solve in the first place. This is why JPA and Hibernate are here to stay as well - because they bridge the gap between the object and relational models of data. Object model for ease of manipulation in memory, and relational for persistence.

In a word Interoperability (big word on a Friday night <G> )

Most businesses have to work with legacy systems running on RDBMS. If they were to use OODBMS, they would still need access to RDBMS for certain functions. It's easier to maintain one way of accessing data than two.

When you have big names like Oracle and SQL Server in the OODBMS world and proven performance in a variety of environments, THEN you'll see more projects using them.

I think it is a case of

If it ain't broke don't change it.

Relational databases are extremely ingrained.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow