문제

I'm running an EMR Activity inside a Data Pipeline analyzing log files and I get the following error when my Pipeline fails:

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://10.211.146.177:9000/home/hadoop/temp-output-s3copy-2013-05-24-00 already exists at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:944) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:905) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:905) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:879) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1316) at com.valtira.datapipeline.stream.CloudFrontStreamLogProcessors.main(CloudFrontStreamLogProcessors.java:216) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:187)

I've tried delete that folder by adding:

FileSystem fs = FileSystem.get(getConf()); fs.delete(new Path("path/to/file"), true); // delete file, true for recursive

but it does not work. Is there a way to override the FileOutputFormat method from Hadoop in java? Is there a way to ignore this error in java?

도움이 되었습니까?

해결책

The path to the file to be deleted changes as the output directory uses date for naming. There are 2 ways to delete it:

Over shell, try this:

hadoop dfs -rmr hdfs://127.0.0.1:9000/home/hadoop/temp-output-s3copy-*

To do it via Java code:

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.Path;
import org.mortbay.log.Log;

public class FSDeletion {

  public static void main(String[] args) {

    try {
      Configuration conf = new Configuration();
      FileSystem fs = FileSystem.get(conf);

      String fsName = conf.get("fs.default.name", "localhost:9000");
      String baseDir = "/home/hadoop/";
      String outputDirPattern = fsName + baseDir + "temp-output-s3copy-";

      Path[] paths = new Path[1];
      paths[0] = new Path(baseDir);

      FileStatus[] status = fs.listStatus(paths);
      Path[] listedPaths = FileUtil.stat2Paths(status);
      for (Path p : listedPaths) {
        if (p.toString().startsWith(outputDirPattern)) {
          Log.info("Attempting to delete : " + p);
          boolean result = fs.delete(p, true);
          Log.info("Deleted ? : " + result);
        }
      }

      fs.close();
    } catch (IOException e) {
      e.printStackTrace();
    }
  }
}
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top