Question

Hi I am new to cascading and following "Enterprise Data Workflows with Cascading" book . I checked with hadoop 1.0.4 and I download cascading 2.1.6 . I setup everything with netbeans IDE with all jar files.

Code :

     package main.java.impatient;

     import java.util.Properties;

     import cascading.flow.Flow;
     import cascading.flow.FlowDef;
     import cascading.flow.hadoop.HadoopFlowConnector;
     import cascading.pipe.Pipe;
     import cascading.property.AppProps;
     import cascading.scheme.hadoop.TextDelimited;
     import cascading.tap.Tap;
     import cascading.tap.hadoop.Hfs;
     import cascading.tuple.Fields;


 public class
    Main
{
 public static void
main( String[] args )
{
String inPath = args[ 0 ];
String outPath = args[ 1 ];

Properties properties = new Properties();
AppProps.setApplicationJarClass( properties, Main.class );
HadoopFlowConnector flowConnector = new HadoopFlowConnector( properties );

// create the source tap
Tap inTap = new Hfs( new TextDelimited( true, "\t" ), inPath );

// create the sink tap
Tap outTap = new Hfs( new TextDelimited( true, "\t" ), outPath );

// specify a pipe to connect the taps
Pipe copyPipe = new Pipe( "copy" );

// connect the taps, pipes, etc., into a flow
FlowDef flowDef = FlowDef.flowDef()
 .addSource( copyPipe, inTap )
 .addTailSink( copyPipe, outTap );

// run the flow
flowConnector.connect( flowDef ).complete();
}
  }

Here is the Error :

      Exception in thread "main" cascading.flow.FlowException: step failed: (1/1) ...ka/cascading/part1/output, with job id: job_201310020226_0004, please see cluster logs for failure messages
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:210)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:145)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:120)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:680)

Hadoop Job Error :

    java.io.IOException: Split class cascading.tap.hadoop.io.MultiInputSplit not found
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:387)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
     Caused by: java.lang.ClassNotFoundException: cascading.tap.hadoop.io.MultiInputSplit
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:249)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:385)
... 7 more

Can anyone help me

Thank You

Was it helpful?

Solution

Similar to Cascading + libjars = ClassNotFoundException. Sometimes ,

I added following line in hadoop-env.sh and it solved

        export HADOOP_CLASSPATH="path_to_cascading_libs/lib/*":$HADOOP_CLASSPATH

OTHER TIPS

You'd better not download and set classpath by yourself which is bad for managing java project module dependencies. You even needn't install or configure hadoop and cascading at all.

The best way to try tutorial codes is

STEP 1: Clean your class path for Hadoop and Cascading which may cause conflicts.

STEP 2: Install required dependencies with gradle (default building tool in impatient project)

cd $your_workspace
git clone git@github.com:Cascading/Impatient.git
cd Impatient
gralde install

STEP 3: You'd better use IDE eclipse(http://eclipse.org/) or IntelliJ IDEA because the impatient project added idea and eclipse plugin by default. Such tasks will download all required dependencies including cascading and hadoop related packages and create IDE specified project files.

gradle idea

or

gradle eclipse

Sample output:

$ gradle idea
:ideaModule
:ideaProject
:ideaWorkspace
:idea
:part1:ideaModule
Download http://repo1.maven.org/maven2/commons-logging/commons-logging/1.1/commons-logging-1.1.pom
Download http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-core/1.1.2/hadoop-core-1.1.2.pom
...
BUILD SUCCESSFUL

Total time: 1 mins 30.579 secs

You may found some IDE specified project files (idea for example):

 $ ls -al impatient.*
-rw-r--r-- 1 wheel  521 Oct 26 07:18 impatient.iml
-rw-r--r-- 1 wheel 4430 Oct 26 07:18 impatient.ipr
-rw-r--r-- 1 wheel 9299 Oct 26 07:18 impatient.iws

STEP 5: import existing project/module using IDE idea or eclipse.

Surely, I think eclipse/idea/netbeans all have there own plugins supporting gradle projects.

Wish help.

Your jar might not be properly structured. Test it with a flat jar with maven-shade-plugin

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top