Question

I have hortonworks HDP2.0 running in sandbox (recently installed) at Windows 8.1 platform. I need to learn how to get giraph working with HDP 2.0,.

I think, giraph is not currently installed with HDP 2.0 bydefault. Can someone help me installing giraph as well as point me to some sources on hands-on coding tutorials.

Was it helpful?

Solution

Try combining this hortonworks MapReduce tutorial and the Giraph Quick Start.

The former shows you how to create a shared folder and copy files between your local and virtual machines. Create a Giraph jar (using the second link), place in the hue Home Directory, give it relevant permissions, and create an input file (as detailed in first link).

When creating the Giraph jar you will need to compile against Hadoop 2 - I did this using the command mvn -Phadoop_2.0.0 package from the Giraph root directory.

Depending on the version of Giraph you are using you may have problems with running as described in the second link, I found

hadoop jar giraph.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/hue/tinygraph.txt -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/hue/output/shortestpaths -w 1

worked for me (note the difference in the specifying the output format using -of instead of -vof, and using SimpleShortestPathsVertex instead of SimpleShortestPathsComputation.

When running the jar I ran into an exception

java.lang.IllegalArgumentException: "checkLocalJobRunnerConfiguration: When using "LocalJobRunner, you cannot run in split master / worker mode since there is only 1 task at a time!"

which I fixed by adding the line

job.getConfiguration().setBoolean("giraph.SplitMasterWorker", false);

to org.apache.giraph.GiraphRunner.java in giraph-core.

I also ran into problems with ZooKeeper ports, which I detailed with a work around here.

Hope this helps!

OTHER TIPS

I have used FBUnicorn's answer above to compile a full guide on how to achieve installing Giraph 1.2.0 on top of a freshly deployed instance of Hortonworks (HDP 2.2).

I have deployed HDP using a VirualBox as the VM would have internet connectivity out-of-the-box, which was not the case with the VMware equivalent.

These are the few steps:

Clone the Giraph git repository

cd /usr/local/
sudo git clone https://github.com/apache/giraph.git

Add user to CentOS

useradd -G hadoop hduser
sudo passwd hduser
sudo chown -R hduser:hadoop giraph
su - hduser

Install Maven (mvn) on CentOS (w/ the help of this article)

wget http://mirror.cc.columbia.edu/pub/software/apache/maven/maven-3/3.0.5/binaries/apache-maven-3.0.5-bin.tar.gz
sudo tar xzf apache-maven-3.0.5-bin.tar.gz -C /usr/local
cd /usr/local
sudo ln -s apache-maven-3.0.5 maven

Maven setup

sudo vi /etc/profile.d/maven.sh

Insert

export M2_HOME=/usr/local/maven
export PATH=${M2_HOME}/bin:${PATH}

Log out and log back in. Ensure that version 3 or greater of maven is available.

mvn -version

Export Hadoop and Giraph directories

vi $HOME/.bashrc

Add

export HADOOP_HOME=/usr/hdp/2.2.0.0-2041/hadoop
export GIRAPH_HOME=/usr/local/giraph

Modify GraphRunner.java as per FBUnicorn answer (/usr/local/giraph/giraph-core/src/main/java/org/apache/giraph)

job.getConfiguration().setBoolean("giraph.SplitMasterWorker", false);

before boolean verbose = !cmd.hasOption('q');

Compile Giraph

source $HOME/.bashrc
cd $GIRAPH_HOME
mvn -Phadoop_2 -fae -DskipTests clean install

Check that the jars are generated in the $GIRAPH_HOME/giraph-core/target/ folder

Create a test example with a tiny graph

vi /tmp/tiny_graph.txt

Insert

[0,0,[[1,1],[3,3]]]
[1,0,[[0,1],[2,2],[3,1]]]
[2,0,[[1,2],[4,4]]]
[3,0,[[0,3],[1,1],[4,4]]]
[4,0,[[3,4],[2,4]]]

Created HDFS folder:

hadoop fs -mkdir -p /user/hduser/input

Copy the graph to HDFS hadoop fs -copyFromLocal /tmp/tiny_graph.txt /user/hduser/input/tiny_graph.txt

Check that file made it to the HDFS repository hadoop fs -ls /user/hduser/input

Process Giraph graph

hadoop jar /usr/local/giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/hduser/input/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/hduser/output/shortestpaths -w 1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top