Question

I am using fs.copyFromLocalFile(local path, Hdfs dest path) in my program. I am deleting the destination path on HDFS every time and before copying file from local machine. But after copying files from Local path, and implementing map reduce on it generates two copies of each file, hence the word count doubles.

To be clear, I have "Home/user/desktop/input/" as my local path and HDFS dest path to be "/input".

When I check the HDFS Destination path, i.e folder on which map reduce was applied this is the result

 hduser@rallapalli-Lenovo-G580:~$ hdfs dfs -ls /input
 14/03/30 08:30:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  Found 4 items
 -rw-r--r--   1 hduser supergroup         62 2014-03-30 08:28 /input/1.txt
 -rw-r--r--   1 hduser supergroup         62 2014-03-30 08:28 /input/1.txt~
 -rw-r--r--   1 hduser supergroup         21 2014-03-30 08:28 /input/2.txt
 -rw-r--r--   1 hduser supergroup         21 2014-03-30 08:28 /input/2.txt~

When I provide Input as single file Home/user/desktop/input/1.txt creates no problem and only single file is copied. But mentioning the directory creates a problem But manually placing each file in the HDFS Dest through command line creates no problem.

I am not sure If I am missing a simple logic of file system. But would be great if any one could suggest where I am going wrong.

I am using hadoop 2.2.0.

I have tried deleting the local temporary files and made sure the text files are not open. Looking for a way to avoid copiying the temporary files.

Thanks in advance.

Was it helpful?

Solution

The files /input/1.txt~ /input/2.txt~ are temporary files created by the File editor you are using in your machine.You can use Ctrl + H to see all hidden temporary files in your local directory and delete them.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top