I'm trying to run rhadoop on Cloudera's hadoop distro (I can't remember if its CDH3 or 4), and am running into an issue: Rstudio server doesn't seem to recognize my global variables.

In my /etc/profile.d/r.sh file, I have:

export HADOOP_HOME=/usr/lib/hadoop
export HADOOP_CONF=/usr/hadoop/conf
export HADOOP_CMD=/usr/bin/hadoop
export HADOOP_STREAMING=/usr/lib/hadoop-mapreduce/

When I run R from the terminal, I get:

> Sys.getenv("HADOOP_CMD")
[1] "usr/bin/hadoop"

But when I run Rstudio server:

> Sys.getenv("HADOOP_CMD")
[1] ""

And as a result, when I try to run rhdfs:

> library("rJava", lib.loc="/home/cloudera/R/x86_64-redhat-linux-gnu-library/2.15")
> library("rhdfs", lib.loc="/home/cloudera/R/x86_64-redhat-linux-gnu-library/2.15")
Error : .onLoad failed in loadNamespace() for 'rhdfs', details: 
    call: fun(libname, pkgname)
    error: Environment variable HADOOP_CMD must be set before loading package rhdfs
Error: package/namespace load failed for 'rhdfs'

Does anyone know where I should be putting my enviornment variables if not in that specific r.sh file?

Thanks!

有帮助吗?

解决方案

You should set your environment variables in .Renviron or Renviron.site. I think these files are defined under R_HOME/etc/Renviron.site. You can get more information by typing:

> ?Startup

Someone had a similar issue here and this is what he did to solve it.

其他提示

Note that on Windows, R looks for the .Renviron file in /Users/<name>/Documents, while RStudio appears to expect the .Renviron file to be in /Users/<name>/.

You should set your environment variables in Rstudio like

Sys.setenv("/path to hadoop")

and then you try this

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top