Question

I was trying to run the simple yarn application from simple-yarn-app. But I am getting the following exception in my application error logs.

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/conf/YarnConfiguration
    at java.lang.Class.getDeclaredMethods0(Native Method)
    at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
    at java.lang.Class.getMethod0(Class.java:2774)
    at java.lang.Class.getMethod(Class.java:1663)
    at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
    at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.conf.YarnConfiguration

But if I run "yarn classpath" command on all my datanodes, I see the following output:

/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*

which has the path to the yarn-client, yarn-api, yarn-common and the hadoop-common jars required by the application. Can anyone point me to the direction where I might have forgotten to set the right classpath.

Was it helpful?

Solution

I found that Hadoop does not resolve $HADOOP_HOME and $YARN_HOME environment variables while iterating over the YarnConfiguration attributes. Running the following in your Yarn Client will print the unresolved configuration, like,

$HADOOP_HOME/, $HADOOP_HOME/lib/

YarnConfiguration conf = new YarnConfiguration()
  for (String c : conf.getStrings(
                YarnConfiguration.YARN_APPLICATION_CLASSPATH,
                YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH)) {
      System.out.println(c);
  }

So, if you provide the full path for the yarn.application.classpath property, the NoClassDefFoundError issue gets resolved.

<property>
    <description>CLASSPATH for YARN applications. A comma-separated list of CLASSPATH entries</description>
    <name>yarn.application.classpath</name>
    <value>
        /etc/hadoop/conf,
        /usr/lib/hadoop/*,
        /usr/lib/hadoop/lib/*,
        /usr/lib/hadoop-hdfs/*,
        /usr/lib/hadoop-hdfs/lib/*,
        /usr/lib/hadoop-mapreduce/*,
        /usr/lib/hadoop-mapreduce/lib/*,
        /usr/lib/hadoop-yarn/*,
        /usr/lib/hadoop-yarn/lib/*
    </value>
  </property>

OTHER TIPS

The issue will happen on YARN clusters where the ResourceManager and/or NodeManager daemons are started with an application classpath that is incomplete. Even something as simple as the included spark-shell will fail:

user@linux$ spark-shell --master yarn-client

Sadly, you only find out when launching your application; or running it long enough to run into the class(es) that are missing. To remedy this issue, I took the output of the following CLASSPATH command,

user@linux$ yarn classpath

and cleaned it up (because it contains duplicates and non-canonical entries), appended it to the below YARN configuration directive, which is found in /etc/hadoop/conf/yarn-site.xml, and finally restarted the YARN cluster daemons:

user@linux$ sudo vi /etc/hadoop/conf/yarn-site.xml
[ ... ]
<property>
  <name>yarn.application.classpath</name>
    <value>
      $HADOOP_CONF_DIR,
      $HADOOP_COMMON_HOME/*,
      $HADOOP_COMMON_HOME/lib/*,
      $HADOOP_HDFS_HOME/*,
      $HADOOP_HDFS_HOME/lib/*,
      $HADOOP_MAPRED_HOME/*,
      $HADOOP_MAPRED_HOME/lib/*,
      $YARN_HOME/*,
      $YARN_HOME/lib/*,
      /etc/hadoop/conf,
      /usr/lib/hadoop/*,
      /usr/lib/hadoop/lib,
      /usr/lib/hadoop/lib/*,
      /usr/lib/hadoop-hdfs,
      /usr/lib/hadoop-hdfs/*,
      /usr/lib/hadoop-hdfs/lib/*,
      /usr/lib/hadoop-yarn/*,
      /usr/lib/hadoop-yarn/lib/*,
      /usr/lib/hadoop-mapreduce/*,
      /usr/lib/hadoop-mapreduce/lib/*
    </value>
</property>

The entries above that don't contain references to environment variables, are the ones that I added. Remember to copy this modified file to all nodes on your YARN cluster prior to restarting the ResourceManager and NameNode daemons.

In general, you'll want to package all non-provided dependencies (classes and modules) into your application archive. =:)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top