Question

I was reading over the Citadel documenation and it mented that it used BerkeleyDB to store the data. Since BerkeleyDB is a key/value store I'm wondering how they can manage all the data relations (since Citadel does a lot of things) using such a simple data model.

CREATE TABLE citadel (
  key LONGBLOB INDEX,
  data LONGBLOB
);

This presents a chance for me to finally see a full application modeled out using a NoSQL database. Yet, I couldn't find any documentation on how they do this.

So, how does citadel structure it's data using only the BerkeleyDB key/value store?

  • How does it map emails to users?
  • How are users related other users?
  • How are contacts stored?
  • How are related email replies found?
  • How are emails marked as veiwed?

and the list goes on, and on...

Was it helpful?

Solution

Quite a few NoSQL databases are, in their bare form, comparable to file-systems. Given a key (=path), you get a blob of data (= file contents). The rest roughly come down to tuning and extra features;

  • Lots (billions and billions) of keys in one namespace? (HBase, Riak, BerkeleyDB, ...)
  • Support for multi-TB values? (Amazon S3) Or tuned for lots of smaller ones (Zookeeper)
  • Opaque values? Some databases don't look at them (HBase, BerkeleyDB), others do (CouchDB).

It currently seem the most popular thing is to do key-scans (HBase, Cassandra, CouchDB, and, I believe, BerkeleyDB), where you request a interval of keys you are interested in, eg. "From foo@bar:emails:folderName:00000000 to foo@bar:emails:folderName:999999999". This usually returns a list of keys and/or values that are in the ASCIIbetic interval between the two. Thus you can emulate a file-like hierarchy in a flat namespace.

Next issue is concurrency. Very brief, most NoSQL databases drop ACID in favor of scalability and/or availability. Look into the CAP Theorem for more details.

In all, it is very hard to do the subject justice in such short space, so I would really recommend you to look into it yourself.

Pick some open-source project apart (OpenTSDB does things in a interesting, yet obvious manner). Or build something on NoSQL yourself.

OTHER TIPS

I delt with Amazon Simple DB a while ago and I suspect that the BerkleyDB might be doing it somewhat similiar.

Both the Key and Value are BLOBS. Essentially you can store anything in there. Lets take a example based on some your points/questions listed.

The points I will cover is the following:

  • How does it map emails to users?
  • How are users related other users?

Like relational databases the key value must be unique so lets assume the user id/ user name is unique. Thus we can have a key value such as admin, jdoe, serviceadmin etc as keys. Since we can store anything in the value field we can store a XML document for example in the value field.

A example might look like this:

Key:
    admin
Value:
     <user>
           <emaillist>
                <email>admin@server.com</email>
           </emaillist>
           <relatedusers>
                 <relateduser>
                          <name>jdoe</name>
                          <relationship>someidentifier</relationship>
                 </relateduser>
                 <relateduser>
                          <name>serviceadmin</name>
                          <relationship>someidentifier</relationship>
                 </relateduser>
           </relatedusers>
      </user>

Since XML can be used to describe data in a variety of way this is probably a very simple example of what can be achieved. However you could store some binary format of data in there that is very similiar to XML that you can retrieve and intepret in some way. Like bit 1 is the active state of the user etc.

The power of NoSQL is that can store anything and the structure from row to row can also be different. This is also the down side. Since there is no way to intepret the data without proper documentation these type of databases are hard to understand from a structure point of view. They can literally contain anything.

Hope it makes sense to some extent now.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top