In Hadoop 1.0.1, how can I use the API to find the weight of each stage in a reduce task?

More explicit, I am trying to see how much of the reduce task is taken by the shuffling phase, the ordering phase and the merging phase.

有帮助吗?

解决方案

I think one way to achieve this could be to monitor the total time taken by the task and time taken by Shuffle and Sort phases. You could use the following methods provided by TaskStatus class to find that :

getStartTime() : Get start time of the task.

getFinishTime() : Get task finish time. if shuffleFinishTime and sortFinishTime are not set before, these are set to finishTime. It takes care of the case when shuffle, sort and finish are completed with in the heartbeat interval and are not reported separately. if task state is TaskStatus.FAILED then finish time represents when the task failed.

getShuffleFinishTime() : Get shuffle finish time for the task. If shuffle finish time was not set due to shuffle/sort/finish phases ending within same heartbeat interval, it is set to finish time of next phase i.e. sort or task finish when these are set.

getSortFinishTime() : Get sort finish time for the task,. If sort finish time was not set due to sort and reduce phase finishing in same heartebat interval, it is set to finish time, when finish time is set.

Another approach could be to use the Counters.

Do let me know if this answers your query. Thank you.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top