Pregunta

Hi there I have a hadoop cluster and I am thinking about writing my own Mapper and Reducer in R, then use Hadoop Streaming to do some time series analysis.

However, I am wondering what is the 'common' way to install any kind of software across the cluster.

There might exist some magic like:

sudo hadoop install R? # Pseudo code

Thanks!

¿Fue útil?

Solución

I actually end up with Linux Expect to automate this installation.

#!/usr/bin/expect -f

if {[llength $argv] != 1} {
puts "usage: ssh.exp server"
exit 1
}

set server [lrange $argv 0 0]
set timeout 60

spawn ssh -i key.pem ec2-user@$server

expect "*connecting (yes/no)*"
send -- "yes\r"

expect "*~]$*"

send -- "sudo su -c 'rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm'\r"

send -- "nohup sudo su -c 'yum install -y R R-core R-core-devel R-devel' &\r"

expect "*~]$*"
send -- "exit\r"

interact

Otros consejos

Automate the installation process across the cluster using automation tools like Puppet and Chef.

Also, there are some wrappers around R like the one from Revolution Analytics and Rhipe to make it easier to write MR programs in R.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top