Question

I have been looking for cloud computing / storage solutions for a long time (inspired by the Google Bigtable). But I can't find a easy-to-use, business-ready solution.

I'm searching a simple, fault tolerant, distributed Key=>Value DB like SimpleDB from Amazon.

I've seen things like:

  1. The CouchDB Project : Simple and distributed, fault-tolerant Database. But it understands only JSON. No XML connectors etc.
  2. Eucalyptus : Nice Amazon EC2 interfaces. Open Standards & XML. But less distributed and less fault-tolerant? There are also a lot of open tickets with XEN/VMWare issues.
  3. Cloudstore / Kosmosfs : Nice distributed, fault tolerant fs. But it's hard to configure. Are there any java connectors?
  4. Apache Hadoop : Nice system which much more then abilities to store data. Uses its own Hadoop Distributed File System and has been testet on clusters with 2000 nodes.
  5. *Amazon SimpleDB : Can't find an open-source alternative! It's a nice but expensive system for huge amounts of data. And you're addicted to Amazon.

Are there other, better solutions out there? Which one is the best to choose? Which one offers the smallest amount of SOF(Singe Point of Failure)?

Was it helpful?

Solution

MongoDB is another option which is very similar to CouchDB, but using query language very similar to SQL instead of map/reduce in JavaScript. It also supports indexes, query profiling, replication and storage of binary data.

It has huge amount of documentation which might be overwhelming at fist, so I would suggest to start with Developer's tour

OTHER TIPS

How about memcached?

The High Scalability blog covers this issue; if there's an open source solution for what you're after, it'll surely be there.

Other projects include:

Another good list: Anti-RDBMS: A list of distributed key-value stores

Wikipedia says that Yahoo both contributes to Hadoop and uses it in production (article linked from wikipedia). So I'd say it counts for business-provenness, although I'm not sure whether it counts as a K/V value database.

Not on your list is the Friendfeed system of using MySQL as a simple schema-less key/value store.

It's hard for me to understand your priorities. CouchDB is simple, fault-tolerant, and distributed, but somehow you exclude it because it doesn't have XML. Are XML and Java connectors an unstated requirement?

(Anyway, CouchDB should in fact be excluded because it's young, its API isn't stable, and it's not a key-value store.)

I use Google's Google Base api, it's Xml, free, documented, cloud based, and has connectors for many languages. I think it will fill your bill if you want free hosting too.

Now if you want to host your own servers Tokyo cabinet is your answer, its key=>value based, uses flat files, and is the fastest database out there right now (very barebones compared to say Oracle, but incredibly good at storing and accessing data, about 1 million records per second, with about 10bytes of overhead (depending on the storage engine)). As for business ready TokyoCabinet is the heart of a service called Mixi, which is the equivalent of Japan's Facebook+MyPage, with several million heavy users, so it's actually very battle proven.

If you want something like Bigtable, you can't go past HBase or Hypertable - they're both open-source Bigtable clones. One thing to consider, though, is if your requirements really are 'big enough' for Bigtable. It scales up to thousands of tablet servers, and as such, has quite a bit of infrastructure under it to enable that (for example, handling the expectation of regular node failures).

If you don't anticipate growing to, at the very least, tens of tablet servers, you might want to consider one of the proposed alternatives: You can't beat BerkelyDb for simplicity, or MySQL for ubiquity. If all you need is a key/value datastore, you can put a simple 'dict' wrapper around your database interface, and switch out your backend if you outgrow one.

You might want to look at hypertable which is modeled after google's bigtable.

Use The CouchDB

  • Whats wrong with JSON?
  • JSON to XML is trivial

You might want to take a look at this (using MySQL as key-value store):

http://bret.appspot.com/entry/how-friendfeed-uses-mysql

Cloudera is a company that commercializes Apache Hadoop, with some value-add of course, like productization, configuration, training & support services.

Instead of looking for something inspired by Google's bigtable- Why not just use bigtable directly? You could write a front-end on Google App-Engine.

Good compilation of storage tools for your question :

http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/

Tokyo Cabinet has also received some attention as it supports table schemas, key value pairs and hash tables. It uses Lua as an embedded scripting platform and uses HTTP as it's communication protocol Here is an great demonstration.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top