Question

Hi I am having troubles coming up with areas where mapreduce would not be suitable. I understand that there is no point using mapreduce with small amounts of data. But what kinds of 'complex' queries are not suitable for mapreduce?

Like with a business with petabytes of data mapreduce would be a good idea for queries which say add up the total quantity of a type of product sold within a month correct?

But then what complex queries would a business typically conduct which would lead to mapreduce adding no real benefit?

Was it helpful?

Solution

Normally everything in mapreduce/hive would be a aggregate query but you can have non-aggregate queries. That would just be a query that had no "reduce" operation.

Hive can do a fairly complex query using multiple queries and windowing functions, etc. So, not sure the statement "mapreduce is suitable with simple aggregate queries" is completely true.

The types of business queries not appropriate for mapreduce/hive are real time queries. For example, trending query such as the top hash tags for twitter, etc. The overhead would make them inefficient.

Or if the data has to be normalized for some reason, the mapreduce/hive requires them to be basically be in one table. For example, if you had a highly normalized "point of sale" database and want to do any sort of query that would be painful unless you denormalized the data first.

OTHER TIPS

One example would be the reporting/data visualization. Some reporting BI tools have Hive as a plugin but you may not want to wait for minutes for data processing/MapReduce to complete. In that case you would do ETL and move data from HDFS to RDBMS such as MySQL/Infobright and use reporting out of the RDBMS.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top