How to use pandas dataframes and numpy arrays in Rpy2?

Question 1

[note: Your code in "edit 2" is working here (Python 2.7, rpy2-2.3.2, R-1.15.2).]

As @dale mentions it whenever R objects are anonymous (that is no R symbol exists for the object) the R deparse(substitute()) will end up returning the structure() of the R object, and a possible fix is to specify the "xlab" and "ylab" parameters; for some plots you'll have to also specify main (the title).

An other way to work around that is to use R's formulas and feed the data frame (more below, after we work out the conversion part).

Forget about what is in pandas.rpy. It is both broken and seem to ignore features available in rpy2.

An earlier quick fix to conversion with ipython can be turned into a proper conversion rather easily. I am considering adding one to the rpy2 codebase (with more bells and whistles), but in the meantime just add the following snippet after all your imports in your code examples. It will transparently convert pandas' DataFrame objects into rpy2's DataFrame whenever an R call is made.

from collections import OrderedDict
py2ri_orig = rpy2.robjects.conversion.py2ri
def conversion_pydataframe(obj):
    if isinstance(obj, pandas.core.frame.DataFrame):
        od = OrderedDict()
        for name, values in obj.iteritems():
            if values.dtype.kind == 'O':
                od[name] = rpy2.robjects.vectors.StrVector(values)
            else:
                od[name] = rpy2.robjects.conversion.py2ri(values)
        return rpy2.robjects.vectors.DataFrame(od)
    elif isinstance(obj, pandas.core.series.Series):
        # converted as a numpy array
        res = py2ri_orig(obj) 
        # "index" is equivalent to "names" in R
        if obj.ndim == 1:
            res.names = ListVector({'x': ro.conversion.py2ri(obj.index)})
        else:
            res.dimnames = ListVector(ro.conversion.py2ri(obj.index))
        return res
    else:
        return py2ri_orig(obj) 
rpy2.robjects.conversion.py2ri = conversion_pydataframe

Now the following code will "just work":

r.plot(rpy2.robjects.Formula('c3~c2'), data)
# `data` was converted to an rpy2 data.frame on the fly
# and the a scatter plot c3 vs c2 (with "c2" and "c3" the labels on
# the "x" axis and "y" axis).

I also note that you are importing ggplot2, without using it. Currently the conversion will have to be explicitly requested. For example:

p = ggplot2.ggplot(rpy2.robjects.conversion.py2ri(data)) +\
    ggplot2.geom_histogram(ggplot2.aes_string(x = 'c3'))
p.plot()

Question 2

You need to pass in the labels explicitly when calling the r.plot function.

r.plot([1,2,3],[1,2,3], xlab="X", ylab="Y")

When you plot in R, it grabs the labels via deparse(substitute(x)) which essentially grabs the variable name from the plot(testX, testY). When you're passing in python objects via rpy2, it's an anonymous R object and akin to the following in R:

> deparse(substitute(c(1,2,3)))
[1] "c(1, 2, 3)"

which is why you're getting the crazy labels.

A lot of times it's saner to use rpy2 to only push data back and forth.

r.assign('testX', df.A)
r.assign('testY', df.B)
%R plot(testX, testY)

rdf = com.convert_to_r_dataframe(df)
r.assign('bob', rdf)
%R plot(bob$$A, bob$$B)

http://nbviewer.ipython.org/4734581/

Question 3

use rpy. the conversion is part of pandas so you don't need to do it yoursef http://pandas.pydata.org/pandas-docs/dev/r_interface.html

In [1217]: from pandas import DataFrame

In [1218]: df = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C':[7,8,9]},
   ......:                index=["one", "two", "three"])
   ......:

In [1219]: r_dataframe = com.convert_to_r_dataframe(df)

In [1220]: print type(r_dataframe)
<class 'rpy2.robjects.vectors.DataFrame'>