Question

Any tools for monitoring performance on a Hadoop cluster in Windows. We installed Hortonworks HDP 2.2.0 on windows single node cluster and tested our jar. we were able to process 5 million records in 26 minutes. Now we have set up a cluster with 4 slave machines and 1 name node. Though the RAM of each machine is 8 Gigs, we are just doing a proof of concept. we see no improvement in the processing time in the cluster. Are there any tools which point out the problem. All the available are written for Linux.

Thanks, Kishore.

Was it helpful?

Solution

5 million records doesn't sound like a lot to throw on Hadoop. What's the size of your data in gb?

I don't know any Hadoop monitoring tools for Windows but you should start with the basics - is your data splittable? Have a look at the resource manager's view - how many containers did you have for your map-reduce app? Were they distributed on all machines? (the capacity scheduler tends not to distribute the load on several machines if it can stick all of it on one). CPU usage per task attempt, io per task attempt?

You should also store, compare and analyze Windows performance counters - cpu, i/o, network to see if you have any bottlenecks.

OTHER TIPS

You may not need Windows-native tools to surface the kinds of performance metrics you are looking for. If you're after performance metrics from YARN, MapReduce, or HDFS, you can collect metrics from each of those technologies out of the box from a web interface/HTTP endpoint exposed by each tech in question.

With HDFS, for example, you can collect metrics from the NameNode and DataNodes via HTTP. In addition, you can access the full suite of metrics via JMX, though that option requires a little more configuration.

I wrote a guide to collecting Hadoop performance metrics with native tools which you might find useful. It details methods for collecting metrics for MapReduce, YARN, HDFS, and ZooKeeper.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top