Question

I set up a cluster of ten machines in which I installed CDH4 (yarn). I run the nameNode, the resourceManger and the historyServer in the same node, and the client in another node.

On the rest of machines, I turned on dataNode and NodeManager. I launched my application on a 100GBytes file, it worked at first and it was relatively quick, but now it gets really really slow at the end of the map (around 90% 100% it takes 30 minutes).

I don't know if the problem comes from the way I coded the program or the way I configured cloudera CDH4. The problem is that it works sometimes but does not work other times although I didn't change anything.

Was it helpful?

Solution

I found out why it took so much time at the end, in fact I thought that the command hadoop fs -expunge allows me to empty the trash but it doesn't, so when Hadoop tried to write in HDFS files the result it was very slow because there was a very little space left.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top