Amazon EMR : java.io.IOException: File already exists: s3n://<bucketname>/output/part-r-00002

https://stackoverflow.com/questions/16369794

14-04-2022
|

문제

I am running a MapReduce Job. My code consists of only one class that does a simple calculation. It runs successfully on the single node setup of hadoop1.0.3 When I run it on EMR, I get the following error

java.io.IOException: File already exists: s3n://<bucketname>/output/part-r-00002
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.create(NativeS3FileSystem.java:647)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:557)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:538)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:445)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:583)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:426)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

해결책

You need to configure your job to write your results to a different output directory each time it is run.

It is complaining now because a file already exists in this location, most likely because you have run this job more than once.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow