Run Hadoop MapReduce Job from Dynamic Web Project on WildFly Servlet

https://stackoverflow.com/questions/22609150

20-06-2023
|

Question

I have installed Hadoop 2.2.0 on ubuntu 13.10 in a pseudo-distributed mode in my pc that starts and runs correctly (only one datanode). I'm using Eclipse Kepler v4.3 with Maven Plugin to develop my Hadoop program and Dynamic Web Project (web project is without Maven).

I have my Hadoop project called "HadWork" from which i created the HadWork.jar (right click on project, Export, Runnable Jar File, with option "Extract required libraries into generated JAR"). It works correctly when run job from command line with: "Hadoop jar HadWork.jar parameter1 parameter2 parameter3" and I see correctly the work progress on the terminal.

Now I want to run job from my dynamic web application deployed on WildFly 8.0 Application Server (Eclipse is already configured to run project on correct server WildFly in standalone mode). I'm writing the Servlet "ServletHadoopTest.java" but I don't understand how run Job from my web application, what libraries (Hadoop libraries jar? My HadWork jar?) I need to load and where load it. I can't simply use the command "Hadoop jar HadWork.jar parameter1 parameter2 parameter3" in my servlet directly. In short, what I want: when I click on button "StartJob" on my page "index.jsp", job HadWork begins to work and shows "Working job.." message in web page, possibly showing the URL to track the job. I would like to call MapReduce Job from Servlet. I can load HadWork.jar on application server (if so, where?).

I'm looking for informations on google but I have not found any answers yet..

Solution

There are basically two options to do what you want to do:

1) Wrap a command line call to hadoop within your servlet. That's super ugly but is easiest to set up, as you do not have to mix you Web application code base and hadoop. Here's an example on how to do that:

    String[] cmd = new String[] { <hadoop command goes here> };
    Process process = Runtime.getRuntime().exec(cmd);

2) The other, better alternative is to package your hadoop code and the hadoop dependencies in your servlet. I would stringly suggest you use maven for dependecy management. You will have to do the following to run your map reduce job from your servlet: - package hadoop and your job jar - Create a Configuration object which reflects your cluster (especially dfs and mapred hosts) - implement Tool

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow