Question

I have been working with relational databases for sometime, but it only recently occurred to me that there must be other types of databases that are non-relational.

What are some examples of non-relational databases, and where/how are they used in the real world? Why would you choose to use a non-relational database over relational databases?

Edit: Two other similar questions have been mentioned in the answers:

Was it helpful?

Solution

  • Flat file
    • CSV or other delimited data
    • spreadsheets
    • /etc/passwd
    • mbox mail files
  • Hierarchical
    • Windows Registry
    • Subversion using the file system, FSFS, instead of Berkley DB

OTHER TIPS

An admittedly obscure but interesting alternative to the types of databases mentioned here is the associative database, such as Sentences, from LazySoft Technology. There is a free personal version you can download and try on your own. The Enterprise Edition is also free, but requires a request to the company.

Essentially, an associative database allows you to store information in much the same way as our brains do: as things and associations between those things. The name "Sentences" comes from the way this information can be represented in a subject-verb-object syntax:

  • Tom is brother to Laura
  • San Francisco is located in California
  • Mike has a credit limit of $10,000

A sentence may be the subject or object of another sentence:

  • (Bus 570 arrives at 8:15am) on Sundays
  • Mary says (the pie was baked by William)

So, everything can be boiled down to entities and associations.

There is, of course, much more to Sentences than what can be expressed here. I recommend that you take some time to read more about it in a white paper from LazySoft.

"The Associative Model of Data" is a book available in PDF format by Simon Williams, one of the creators of Sentences.

A non-relational document oriented database we have been looking at is Apache CouchDB.

Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API. Among other features, it provides robust, incremental replication with bi-directional conflict detection and resolution, and is queryable and indexable using a table-oriented view engine with JavaScript acting as the default view definition language.

Our interest was in providing a distributed access user preferences store that would be immune to shape changes to which we could serialize preference objects from Java and access those just as easily with Javascript from a XULRunner based client application.

Any database that claims to be a "Berkley style Database" or "Key/Value" Database is not relational.

These databases are usually based off complex hashing algorithms and provide a very fast lookup O(1) based off a key, but leave any form of relational goodness to end user.

For example, in a relational database, you would normalize your structure and join many tables together to create a single result set.

In a key/value database, you would denormalize as much as possible and then use a unique key to lookup data.

If you need to pull data from two sources, you would have to join the resulting set together by hand.

All databases were originally non-relational, it was only with the arrival of DB2 and Oracle in the mid 1980's that they became common. Before that most databases where either flat files or hierarchical.

Flat files are inherently boring, but hierarchical database are much less so, particularly as DB2 was actually implemented on top of an hierarchical implementation (namely VSAM) in the first instance. VSAM is I believe still around on mainframe systems and is of some considerable importance.

DB/1 (so obscure now I can't even find a wikipedia link) was IBM's predecessor prime-time database to DB2 (hence the name). This was hierarchical - basically you had a file which consisted of any number or 'root' records, generally directly accessible by a key. Each root record could then have any number of child records off it, each of which could in turn have it's own children. The net effect is a index file or root records with each root being the top of a potential tree-like structure. Accessing the child records could be tricky - there were limitations of direct access so usually you ended up traversing the tree looking for the record you needed. A 'database' could have any number of these files in it, usually related by keys.

This had major disadvantages - not least that actually doing anything demanded a full program written - basically the equivalent of a days work for what we can now do in SQL in a few minutes. However it really did score on execution speed, in those days a mainframe had about the processing power of your iPhone (albeit optimized for data I/O) and poor DB2 queries could kill a multi-million dollar installation dead. This was never an issue with DB/1 and in a world where programmers were less expensive than CPU time it made sense.

Google App Engine Datastore :

The App Engine datastore is not a relational database. While the datastore interface has many of the same features of traditional databases, the datastore's unique characteristics imply a different way of designing and managing data to take advantage of the ability to scale automatically.

The PI historical database from OSIsoft is non-relational. It's only made to archive timestamped data. It's used a lot by industry, especially as the back-end database for all those 'dashboards'.

There's no need to be relational in it, since there are no joins.

Other two types of databases that haven't come up yet:

  1. Content Repositories are databases designed for content (i.e. files, documents, images, etc). They typically have additioan constructs such as a hierarchical way to browse content, search, transformation between different formats, versioning, and many other things. Examples - Alfresco, Documentum, JackRabbit, Day, OpenText, many other ECM vendors.

  2. Directories, i.e. Active Directory, or LDAP Directories. These are databases designed for low-write / high read scenarios and highly distributed across high geographical distances / high latency connections. While mostly used for authentication / authorization, they don't have to be if your use case matches the requirements.

Dimensional Databases are great examples of non-relational databases. They are very commonly used for 'Business Dashboards'/'Business Intelligence' for KPIs and other types of aggregate or statistical data. They are usually populated from relational databases and can offer better performance in certain situations.

http://en.wikipedia.org/wiki/Dimensional_database

  1. XML databases e.g. xindice
  2. Object databases e.g. db4o

Be aware that the concept of relational databases is highly contentious. Purists such as C. J. Date would argue that many databases in common use (such as Oracle and SQL Server) do not comply sufficiently with the relational model to be termed 'relational'.

Non-Relational databases just do not meet Codd’s requirements. Intersystems Caché seams a total re-write/re-design of the old Pick Operating system’s database. From the little I’ve read of Caché it appears to be a nicely done redesign. It permits .net programs to access the database just like it would SQL. Caché’s run’s the Pick OS programs without requiring any changes. By importing your Pick files into Caché you can still run your old green screen applications with it, but also write new programs using .net so you can migrate to Windows Applications without abandoning the years of data design you’ve already invested in. Here is some background on the Pick DB model . A Pick database uses totally variable length records, and fields. All table are keyed by a single unique key and are accessible without reading an index. Pick designed the system to use a Hashing algorithm that reads the item from disk on generally on the 1st physical read (assuming system maintenance was performed correctly). Fields in Pick are Non-Typed. All data is stored as string and Casting is up to the programmer. Nulls are stored as a empty string, thus a null does not take up disk space as it does in SQL. There is no need for Foreign Keys. In the ‘Relational world’ the DBA has to create and Order Header table, and an Order Line Item table. In the “Pick Model’ there is a single table. An example would be, ‘Order Date’ is a field that would store a # of days since ‘Dec 13, 1967’ (the data Pick OS was turned on for the first time). Pick programmers did not have Y2k problems. A 2nd column would be Customer Number. The big difference is when you get to the Product Number Column, it would be ‘Multi-Valued’ (the Codd Non-Conformance). In other words, the database can handle 1-32000 product#s in that column. Other columns like Quantity Ordered would be in a controlling/dependant relationship with the Product Number and would also be multi-valued. When you get to the Quantity Shipped, Pick would go to a third dimension and have Sub-Multi-Valued field. You would have a Shipment Number Column, and it would be Multi-Valued by Line Item and Sub-Multi-Valued containing The Shipment Quantity for that line for that shipment number. There are No Inner Joins needed. All data for that Order is stored in One Table and in a single record. No orphan rows ever! Secondly the data definition is a bit different. Our dictionaries can contain definitions for data that is not in this table or is being manipulated. A couple of examples are, Customer Name. It would be defined as ‘Use the Customer number column and return the Name field from the Customer Table. Another Example is Line Item Extension would be defined as a calculation of Quantity*Price/PricePer. I believe I read somewhere Caché claims to have over 100,000 installations.

I would think a flat-file database in Excel is non-relational and used by quite a few people.

It is really just a database table that can not be joined with other tables.

Object-Oriented Databases are one interesting type of non-relational database.

The trading sector sometimes uses OO Databases since each deal/contract can look sort of like others in that category but have unique attributes as well. VERY difficult to represent it relationally.

eXist-db is an xml database that has been around for a long time. It is particularly useful for xquery over tons of xml documents.

Any file or group of files that contains data but does not express relationships within that data is a non-relational database.

RRDtool is designed to store and aggregate log data. You configure a sampling interval and feed data into it, then it returns time-based results. It's optimized for fixed-size storage, and starts aggregating past results after a time. For example, suppose you have a round-robin database with a 5-minute time interval. Even if you send it temperature data once per second, it still only stores the results in 5-minute increments. After a week, it averages those results into hourly values. After a month, hourly results are averaged into daily numbers, and so on.

RRDtool is commonly used as the backend for tools like Cricket and MRTG to track network and environmental data for months and years at a stretch.

For a graph based dbms you have neo4j

For a hierarichal dbms you have any standard filesystem or with "schema" support any LDAP implementation.

There are many answers but they all end up being in one of two major categories:

  1. Navigational. Includes Tree/Hierarchy databases and Graph databases.

  2. Databases that break first normal form (multiple values). Includes Pick databases and Lotus Notes and its offspring like CouchDB.

EDIT: And of course key/value stores like BDB aren't relational, but that just goes without saying doesn't it? I mean, they're just key/value stores.

dBase. Although it was marketed as such, it doesn't meet the requirements.

As an OO database, Intersystems Caché comes to mind. Some medical and library systems are built on this.

  1. In my company, www.smartsgroup.com, we have a proprietary database engine we call a "transaction log database". It is built on flat files, each file containing a sequence of "events" or "messages", in binary format, plus various indexes on this data and algorithms for reproducing the state of a stock exchange's orderbook. It is highly optimised for sequential updates and sequential access.

  2. In scientific applications, it is also common to use proprietary database engines rather than RDBMS's. I also worked for a company that has the world's largest database of EEG brain recordings: www.brainresource.com . There we use a flat file database, and it worked well for us.

  3. SmartsGroup also uses a temporal database, which is like a non-relational database table, except that we store a history of all changes to all fields so we can reproduce the state of a particular row on a particular date.

The Wiki page for Dimensional Databases linked to above seems to have disappeared.

Some OLAP Systems have are backed by Multidimensional databases (MOLAP) these are used often in financial analysis. They afford interactive clients that allow one to navigate through the data at different levels of aggregation.

At my university there is a group that researches deductive databases.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top