Question

Unhindered by any pre-existing knowledge of R, Rpy2 and ggplot2 I would never the less like to create a scatterplot of a trivial table from Python.

To set this up I've just installed:

  • Ubuntu 11.10 64 bit
  • R version 2.14.2 (from r-cran mirror)
  • ggplot2 (through R> install.packages('ggplot2'))
  • rpy2-2.2.5 (through easy_install)

Following this I am able to plot some example dataframes from an interactive R session using ggplot2.

However, when I merely try to import ggplot2 as I've seen in an example I found online, I get the following error:

from rpy2.robjects.lib import ggplot2
  File ".../rpy2/robjects/lib/ggplot2.py", line 23, in <module>
    class GGPlot(robjects.RObject):
  File ".../rpy2/robjects/lib/ggplot2.py", line 26, in GGPlot
    _rprint = ggplot2_env['print.ggplot']
  File ".../rpy2/robjects/environments.py", line 14, in __getitem__
    res = super(Environment, self).__getitem__(item)
LookupError: 'print.ggplot' not found

Can anyone tell me what I am doing wrong? As I said the offending import comes from an online example, so it might well be that there is some other way I should be using gplot2 through rpy2.


For reference, and unrelated to the problem above, here's an example of the dataframe I would like to plot, once I get the import to work (should not be a problem looking at the examples). The idea is to create a scatter plot with the lengths on the x axis, the percentages on the Y axis, and the boolean is used to color the dots, whcih I would then like to save to a file (either image or pdf). Given that these requirements are very limited, alternative solutions are welcome as well.

     original.length row.retained percentage.retained
1               1875        FALSE                11.00
2               1143        FALSE                23.00
3                960        FALSE                44.00
4               1302        FALSE                66.00
5               2016        TRUE                 87.00
Was it helpful?

Solution

There were changes in the R package ggplot2 that broke the rpy2 layer. Try with a recent (I just fixed this) snapshot of the "default" branch (rpy2-2.3.0-dev) for the rpy2 code on bitbucket.

Edit: rpy2-2.3.0 is a couple of months behind schedule. I just pushed a bugfix release rpy2-2.2.6 that should address the problem.

OTHER TIPS

Although I can't help you with a fix for the import error you're seeing, there is a similar example using lattice here: lattice with rpy2.

Also, the standard R plot function accepts coloring by using the factor function (which you can feed the row.retained column. Example:

plot(original.length, percentage.retained, type="p", col=factor(row.retained))

Based on fucitol's answer I've instead implemented the plot using both the default plot & lattice. Here are both the implementations:

from rpy2 import robjects
#Convert to R objects
original_lengths = robjects.IntVector(original_lengths)
percentages_retained = robjects.FloatVector(percentages_retained)
row_retained = robjects.StrVector(row_retained)

#Plot using standard plot
r = robjects.r
r.plot(x=percentages_retained,
       y=original_lengths,
       col=row_retained,
       main='Title',
       xlab='Percentage retained',
       ylab='Original length',
       sub='subtitle',
       pch=18)

#Plot using lattice
from rpy2.robjects import Formula
from rpy2.robjects.packages import importr
lattice = importr('lattice')
formula = Formula('lengths ~ percentages')
formula.getenvironment()['lengths'] = original_lengths
formula.getenvironment()['percentages'] = percentages_retained

p = lattice.xyplot(formula,
                   col=row_retained,
                   main='Title',
                   xlab='Percentage retained',
                   ylab='Original length',
                   sub='subtitle',
                   pch=18)
rprint = robjects.globalenv.get("print")
rprint(p)

It's a shame I can't get ggplot2 to work, as it produces nicer graphs by default and I regard working with dataframes as more explicit. Any help in that direction is still welcome!

If you don't have any experience with R but with python, you can use numpy or pandas for data analysis and matplotlib for plotting.

Here is a small example how "this feels like":

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 

df = pd.DataFrame({'original_length': [1875, 1143, 960, 1302, 2016],
                   'row_retained': [False, False, False, False, True],
                   'percentage_retained': [11.0, 23.0, 44.0, 66.0, 87.0]})
fig, ax = plt.subplots()
ax.scatter(df.original_length, df.percentage_retained,
           c=np.where(df.row_retained, 'green', 'red'),
           s=np.random.randint(50, 500, 5)
           )   
true_value = df[df.row_retained]
ax.annotate('This one is True',
            xy=(true_value.original_length, true_value.percentage_retained),
            xytext=(0.1, 0.001), textcoords='figure fraction',
            arrowprops=dict(arrowstyle="->"))
ax.grid()
ax.set_xlabel('Original Length')
ax.set_ylabel('Precentage Retained')
ax.margins(0.04)
plt.tight_layout()
plt.savefig('alternative.png')

alternative.png

pandas also has an experimental rpy2 interface.

The problem is caused by the latest ggplot2 version which is 0.9.0. This version doesn't have the function print.ggplot() which is found in ggplot2 version 0.8.9.

I tried to tinker with the rpy2 code to make it work with the newest ggplot2 but the extend of the changes seem to be quite large.

Meanwhile, just downgrade your ggplot2 version to 0.8.9

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top