Hadoop "Spill Failed" Exception in an ec2 instance with 420GB of instance storage

https://stackoverflow.com/questions/22108146

18-10-2022
|

Question

I am using Hadoop2.3.0 and have installed it as single node cluster (psuedo-distributed mode) on CentOS 6.4 Amazon ec2 instance with an instance storage of 420GB and 7.5GB of RAM , my understanding is that the " Spill Failed " exception only occurs when the node runs out of the disk space however , after running map/reduce tasks for only a short amount of time (no where near to 420 GB of data ) I get the following exception.

I would like to mention that I moved the Hadoop installation on the same node from a EBS volume of 8GB(where I had installed it originally) to an instance store volume of 420GB on the same node and changed the $HADOOP_HOME environment variable and other properties to point to the instance store volume accordingly and the Hadoop2.3.0 is now completely contained in the 420GB drive.

However I still see the following exception , can you please let me know if there is anything besides Diskspace that can cause the Spill Failed exception ?

2014-02-28 15:35:07,630 ERROR [IPC Server handler 12 on 58189] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1393591821307_0013_m_000000_0 - exited : 
java.io.IOException: Spill failed
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1533)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1442)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Unknown Source)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1393591821307_0013_m_000000_0_spill_26.out
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:402)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
    at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1564)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:853)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1503)


2014-02-28 15:35:07,604 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: Spill failed
2014-02-28 15:35:07,605 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Spill failed
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1533)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1442)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Unknown Source)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1393591821307_0013_m_000000_0_spill_26.out
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:402)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
    at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1564)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:853)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1503)

Solution

I was able to solve this by setting the hadoop.tmp.dir value to something on the instace storage , by default it was pointing to the EBS backed root volume.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow