Question

I have been studying NoSQL and Hadoop for Data Warehousing however I never worked with this technologies before and I would like to inquire if this following is possible to check if I got my understanding of this technologies right.

If I have my data stored in MongoDB, can I use Hadoop with Hive to make Hiveql queries directly to MongoDB and store the output of those queries as views back in MongoDB again, instead of the HDFS?

Also If I understand correctly most of the NoSQL databases don't support joins and aggregates, but it's possible to make them through map-reduce. If HiveQL queries are map-reduce jobs when I do a join in HiveQL would it already be automatically "joining" the MongoDB data in map-reduce for me, with no need to be worried about the lack of support for joins and aggregates in MongoDB?

Était-ce utile?

La solution

MongoDB does have very good support for Aggregation kind of functions. There are no joins of-course. The way MongoDB Schema is usually designed is such that you would typically not need a join.

HiveQL operates on 'Tables' in HDFS. That's the default behavior. But you have a MongoDB-Hadoop Connector: http://docs.mongodb.org/ecosystem/tools/hadoop/ which will let you query MongoDB data from within Hadoop.

To use Map Reduce you can do that with MongoDB itself (without Hadoop). See this: http://docs.mongodb.org/manual/core/map-reduce/

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top