Question

I need to process data stored on Hadoop in R (some clustering, and statistic). I used Hive to analysis data previously. I found JDBC package for R and would like to use it. However, it doesn't works, it seems a lot of jars are not available. Could you provide a good instruction or tutorial? How to query data from Hive in R?

Was it helpful?

Solution

you were need to copy Hive's jars to your R classpath and load them to RJDBC. You can read details with sample in my blog here http://simpletoad.blogspot.com/2013/12/r-connection-to-hive.html

OTHER TIPS

or you have rhive package which you can use with below commands

you can simply connect to hiveserver2 from R using RHIVE package

below are the commands that i had used.

Sys.setenv(HIVE_HOME="/usr/local/hive") Sys.setenv(HADOOP_HOME="/usr/local/hadoop") rhive.env(ALL=TRUE) rhive.init() rhive.connect("localhost")
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top