Start a java application from Hadoop YARN

https://stackoverflow.com/questions/21836825

12-10-2022
|

Question

I'm trying to run a java application from a YARN application (in detail: from the ApplicationMaster in the YARN app). All examples I found are dealing with bash scripts that are ran.

My problem seems to be that I distribute the JAR file wrongly to the nodes in my cluster. I specify the JAR as local resource in the YARN client.

Path jarPath2 = new Path("/hdfs/yarn1/08_PrimeCalculator.jar");
jarPath2 = fs.makeQualified(jarPath2);

FileStatus jarStat2 = null;
try {
    jarStat2 = fs.getFileStatus(jarPath2);
    log.log(Level.INFO, "JAR path in HDFS is "+jarStat2.getPath());
} catch (IOException e) {
    e.printStackTrace();
}

LocalResource packageResource = Records.newRecord(LocalResource.class);
packageResource.setResource(ConverterUtils.getYarnUrlFromPath(jarPath2));
packageResource.setSize(jarStat2.getLen());
packageResource.setTimestamp(jarStat2.getModificationTime());
packageResource.setType(LocalResourceType.ARCHIVE);
packageResource.setVisibility(LocalResourceVisibility.PUBLIC);

Map<String, LocalResource> res = new HashMap<String, LocalResource>();
res.put("package", packageResource);

So my JAR is supposed to be distributed to the ApplicationMaster and be unpacked since I specify the ResourceType to be an ARCHIVE. On the AM I try to call a class from the JAR like this:

String command = "java -cp './package/*' de.jofre.prime.PrimeCalculator";

The Hadoop logs tell me when running the application: "Could not find or load main class de.jofre.prime.PrimeCalculator". The class exists at exactly the path that is shown in the error message.

Any ideas what I am doing wrong here?

La solution

I found out how to start a java process from an ApplicationMaster. Infact, my problem was based on the command to start the process even if this is the officially documented way provided by the Apache Hadoop project.

What I did no was to specify the packageResource to be a file not an archive:

packageResource.setType(LocalResourceType.FILE);

Now the node manager does not extract the resource but leaves it as file. In my case as JAR. To start the process I call:

java -jar primecalculator.jar

To start a JAR without specifying a main class in command line you have to specify the main class in the MANIFEST file (Manually or let maven do it for you).

To sum it up: I did NOT added the resource as archive but as file and I did not use the -cp command to add the syslink folder that is created by hadoop for the extracted archive folder. I simply startet the JAR via the -jar parameter and that's it.

Hope it helps you guys!

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow