Question

I'm searching for any NoSQL system (preferably open source) that supports analytic functions (AF for short) like Oracle/SQL Server/Postgres does. I didn't find any with build-in functions. I've read something about Hive but it doesn't have actual feature of AF (windows, first_last values, ntiles, lag, lead and so on) just histograms and ngrams. Also some NoSQL systems (Redis for example) support map/reduce, but I'm not sure if AF can be replaced with it.

I want to make a performance comparison to choose either Postgres or NoSQL system.

So, in short:

  1. Searching for NoSQL systems with AF
  2. Can I rely on map/reduce to replace AF? Is it fast, reliable, easy to go.

ps. I tried to make my question more constructive.

Was it helpful?

Solution

Some function uses knowledge of all existing data when it involves some king of aggregation (avg, median, standard deviation) or some ordering (first, last).

If you want a distributed NOSQL solution that support AF out of the box, the system will need to rely on some centralized indexing and metadata to keep information about the data in all nodes, thus having a master-node and probably a single point of failure.

You have to ask what you expect to accomplish using NoSQL. You want schemaless tables ? Distributed data ? Better raw performance for very simple queries ?

Depending of your needs, I see three main alternatives here:

1 - use a distributed NoSQL with no single point of failure (ie: Cassandra) to store your data and use map/reduce to process the data and produce the results for the desired function (almost any major NoSQL solution support Hadoop). The caveat is that map/reduce queries are not realtime (can take minutes or hours to execute the query) and requires extra-setup and learning.

2 - use a traditional RDBMS that support multiple servers like MySQL Cluster

3 - use a NoSQL with master/slave topology that supports ad-hoc and aggregation queries like Mongo

As for the second question: yes, you can rely on M/R to replace AF. You can do almost anything with M/R.

OTHER TIPS

Once you've really understood how MapReduce works, you can do amazing things with a few lines of code.

Here is a nice video course:

http://code.google.com/intl/fr/edu/submissions/mapreduce-minilecture/listing.html

The real difficulty factor will be between functions that you can implement with a single MapReduce and those that will need chained MapReduces. Moreover, some nice MapReduce implementations (like CouchDB) don't allow you to chain MapReduces (easily).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top