Question

I have been studying NoSQL and Hadoop for Data Warehousing however I never worked with this technologies before and I would like to inquire if this following is possible to check if I got my understanding of this technologies right.

If I have my data stored in MongoDB, can I use Hadoop with Hive to make Hiveql queries directly to MongoDB and store the output of those queries as views back in MongoDB again, instead of the HDFS?

Also If I understand correctly most of the NoSQL databases don't support joins and aggregates, but it's possible to make them through map-reduce. If HiveQL queries are map-reduce jobs when I do a join in HiveQL would it already be automatically "joining" the MongoDB data in map-reduce for me, with no need to be worried about the lack of support for joins and aggregates in MongoDB?

Was it helpful?

Solution

MongoDB does have very good support for Aggregation kind of functions. There are no joins of-course. The way MongoDB Schema is usually designed is such that you would typically not need a join.

HiveQL operates on 'Tables' in HDFS. That's the default behavior. But you have a MongoDB-Hadoop Connector: http://docs.mongodb.org/ecosystem/tools/hadoop/ which will let you query MongoDB data from within Hadoop.

To use Map Reduce you can do that with MongoDB itself (without Hadoop). See this: http://docs.mongodb.org/manual/core/map-reduce/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top