Hadoop: Error in configuring object

https://stackoverflow.com/questions/8980412

12-11-2019
|

Question

I'm trying to run the Terasort benchmarks and i'm getting the following exception:

java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask$OldOutputCollector.<init>(MapTask.java:573)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 10 more
Caused by: java.lang.IllegalArgumentException: can't read paritions file
    at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.configure(TeraSort.java:213)
    ... 15 more
Caused by: java.io.FileNotFoundException: File _partition.lst does not exist.
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
    at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:720)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
    at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.readPartitions(TeraSort.java:153)
    at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.configure(TeraSort.java:210)
    ... 15 more

The TeraGen commands run fine and have created the input files for TeraSort. Here is the listing of my input directory:

bin/hadoop fs -ls /user/hadoop/terasort-input/Warning: Maximum heap size rounded up to 1024 MB
Found 5 items
-rw-r--r--   1 sqatest supergroup           0 2012-01-23 14:13 /user/hadoop/terasort-input/_SUCCESS
drwxr-xr-x   - sqatest supergroup           0 2012-01-23 13:30 /user/hadoop/terasort-input/_logs
-rw-r--r--   1 sqatest supergroup         129 2012-01-23 15:49 /user/hadoop/terasort-input/_partition.lst
-rw-r--r--   1 sqatest supergroup 50000000000 2012-01-23 13:30 /user/hadoop/terasort-input/part-00000
-rw-r--r--   1 sqatest supergroup 50000000000 2012-01-23 13:30 /user/hadoop/terasort-input/part-00001

Here is my command for running the terasort:

bin/hadoop jar hadoop-examples-0.20.203.0.jar terasort -libjars hadoop-examples-0.20.203.0.jar /user/hadoop/terasort-input /user/hadoop/terasort-output

I do see the file _partition.lst in my input directory, i dont understand why i am getting the FileNotFoundException.

I followed the setup details provided at: http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/

Solution 2

The problem was occurring because i was deploying the job on an NFS. I changed the hadoop.tmp.dir to point it to a local file system(/tmp) and the problem disappeared in a jiffy.

OTHER TIPS

I got this to work as follows:

I'm running in local mode from my hadoop base directory, hadoop-1.0.0 with an input subdirectory under it, and I get the same error you do.

I edited the failing java file to get it to log the path instead of the filename, rebuilt it ("ant binary"), and reran it. It was looking for the file in the directory I was running from. I have no idea if it was looking in the hadoop base dir or the execution dir.

...so I made a symbolic link in the directory I run terasort in pointing to the real file in the input directory.

It's a cheap hack, but it works.

- Tim.

Have you setup to run in pseudo distributed mode (or a real cluster)? Unless you configure Hadoop, it will run in local job runner mode (as libs inside a single process) - Terasort does NOT work in LocalJobRunner mode. Look for the word LocalJobRunner in the output to check.

Here is a link to setup HDFS, SSH and rsync: http://hadoop.apache.org/docs/r1.1.1/single_node_setup.html#PseudoDistributed

I am using cloudera CDH4. faced similar issue with my other hadoop program. believe the issues is about linking external libraries.

The program was working fine in Eclipse (local mode) but when I tried to run it in pseudo distributed mode, got this error message.

Temporary solution: - Created a jar file from Eclipse with library handling option - copy required libraries into a subfolder next to the generated JAR. - Copied the JAR file to hadoop home directory (the path where hadoop-exampls.jar files is placed)

with this fix am able to run the hadoop program with out any errors. hope this'll help

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow