Question

I had a strange experience while running a hive query (simple count of entries in an external table) along side a normal map reduce (word count program). My wordcount map reduce was started first, hive query started second. Hive query was some how fast and my first map reduce got stuck. Is there any case where Hive map reduce blocks all other map reduce running along side?

I request you to add in your views on this question.

Was it helpful?

Solution

I am assuming this is not consistently the case. Hive does not block any other jobs on the cluster. Cluster load and network latency can impact which job finishes first. If you are trying to compare two jobs to see which is the faster one, submit them at the same time and run the test at least 5-10 times and take the average times into consideration.

OTHER TIPS

The order in which jobs complete is dependent upon the number of map and reduce tasks that are requested by the job, as well as the cluster's scheduler configuration.

If a job requests a number of reduce tasks that is greater that the cluster has available, other jobs are forced to wait until a reducer task completes. The scheduler can then assign the free reduce slot to a waiting job (again, dependent upon scheduler configuration).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top