Question

I am trying to use rpy2 to let me use some r functionality in python. Here is a simple regression I want to do. I create a data frame, convert it to R data frame and then try using R's lm. But the R data frame cannot be found (see below). Where should I look to troubleshoot?

FYI I am using python 2.7.3, rpy2-2.3.2, pandas version '0.10.1' and R2.15.3

>>> import rpy2
>>> import pandas as pd
>>> import pandas.rpy.common as com
>>> datframe = pd.DataFrame({'a' : [1, 2, 3], 'b' : [3, 4, 5]})
>>> r_df = com.convert_to_r_dataframe(datframe)
>>> r_df     
(DataFrame - Python:0x32547e8 / R:0x345d640)
[IntVector, IntVector]
  a: (class 'rpy2.robjects.vectors.IntVector')
  (IntVector - Python:0x3254e18 / R:0x345d608)
[       1,        2,        3]
  b: (class 'rpy2.robjects.vectors.IntVector')
  (IntVector - Python:0x3254e60 / R:0x345d5d0)
[       3,        4,        5]
>>> print type(r_df)
(class 'rpy2.robjects.vectors.DataFrame')
>>> from rpy2.robjects import r
>>> r('lmout <- lm(r_df$a ~ r_df$b)')

Error in eval(expr, envir, enclos) : object 'r_df' not found
Traceback (most recent call last):
  File "<pyshell#8>", line 1, in <module>
    r('lmout <- lm(r_df$a ~ r_df$b)')
  File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/__init__.py", line 236, in __call__
    res = self.eval(p)
  File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 86, in __call__
    return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 35, in __call__
    res = super(Function, self).__call__(*new_args, **new_kwargs)
RRuntimeError: Error in eval(expr, envir, enclos) : object 'r_df' not found
Was it helpful?

Solution

When calling

r('lmout <- lm(r_df$a ~ r_df$b)')

the embedded R will look for a variable r_df, and no such variable is made visible to R in your code example.

When doing

r_df = com.convert_to_r_dataframe(datframe)

you are creating the variable r_df on the Python side but while the actual data in now in R, there is no symbol (name) associated with it known to R. That data structure remains anonymous. (btw, you may want to use the automagic conversion of pandas data frames shipping with rpy2-2.3.3).

To create a variable name in R's "global environment", add this:

from rpy2.robjects import globalenv
globalenv['r_df'] = r_df

Now your lm() call should work.

OTHER TIPS

try this, (not sure which header do the magic, though....)

import rpy2.robjects as robjects
from rpy2.robjects import DataFrame, Formula
import rpy2.robjects.numpy2ri as npr
import numpy as np
from rpy2.robjects.packages import importr


def my_linear_fit_using_r(X,Y,verbose=True):
   # ## FITTINGS:   RPy implementation ###
   r_correlation = robjects.r('function(x,y) cor.test(x,y)')
   # r_quadfit = robjects.r('function(x,y) lm(y~I(x)+I(x^2))')
   r_linfit = robjects.r('function(x,y) lm(y~x)')
   r_get_r2=robjects.r('function(x) summary(x)$r.squared')
   lin=r_linfit(robjects.FloatVector(X),robjects.FloatVector(Y))
   coef_lin=robjects.r.coef(lin)
   a=coef_lin[0]
   b=coef_lin[1]
   r2=r_get_r2(lin)
   ci=robjects.r.confint(lin) # confidence intervals
   lwr_a=ci[0]
   lwr_b=ci[1]
   upr_a=ci[2]
   upr_b=ci[3]
   if verbose:
      print robjects.r.summary(lin)
      # print robjects.r.summary(quad)
   return (a,b,r2[0],lwr_a,upr_a,lwr_b,upr_b)

Just a remark, for simple regressions you can do it completely in Python, use ols from statsmodels:

from statsmodels.formula.api import ols

lmout = ols('a ~ b', datframe).fit()
lmout.summary()
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top