Question

I'm trying to translate the following code into Rpy2 with no success:

neworder <- c("virginica","setosa","versicolor")
library("plyr")
iris2 <- arrange(transform(iris,
             Species=factor(Species,levels=neworder)),Species)

This is meant to just change the factor order of a particular column, in this case Species.

I don't want to use plyr and all that stuff in Rpy2 too since I can just modify the the dataframe plotted as a Python object. The following does not work:

# start with Python df 'mydf' and convert to R df
# to get mydf_r. The column equivalent of Species here
# is "variable"
# ...
mydf_r.variable = r.factor(ro.StrVector(["a", "b", "c"]))
# call ggplot...
ggplot2.ggplot(mydf) + ...

This does not work. How can I get the equivalent of the R code? I.e. I have a melted dataframe with several values of variable plotted as c, b, a and I want to change the order to be a, b, c by changing the factor order of variable. Thanks.

edit I was able to change the order with this code:

labels = robj.StrVector(tuple(["a", "b", "c"]))
variable_factor = r.factor(labels, levels=labels)
r_melted = r.transform(r_melted, **{"variable": variable_factor})
p = ggplot2.ggplot(r_melted) + \
    ggplot2.geom_boxplot(aes_string(**{"x": "variable",
                                       "y": "value"
                                        "fill": "group"})) + \
    ggplot2.scale_fill_manual(values=np.array(["#00BA38", "#F8766D"])) + \
    ggplot2.coord_flip()

However, this breaks ggplot's ability to correctly make the boxplot and color code it by group variable. If I remove the lines:

labels = robj.StrVector(tuple(["a", "b", "c"]))
variable_factor = r.factor(labels, levels=labels)
r_melted = r.transform(r_melted, **{"variable": variable_factor})

Then it all works correctly... all I want is to change the order in which the variable values appear in the boxplot.

@lgautier: the solution you gave looks like what I want, but it does not work for me here. I made a test case for it with the iris dataset:

original plot

import os
iris = pandas.read_table(os.path.expanduser("~/iris.csv"),
                         sep=",")
iris["Species"] = iris["Name"]
r_melted = conversion_pydataframe(iris)
p = ggplot2.ggplot(r_melted) + \
    ggplot2.geom_boxplot(aes_string(**{"x": "PetalLength",
                                       "y": "PetalWidth",
                                       "fill": "Species"})) + \
    ggplot2.facet_grid(Formula("Species ~ .")) + \
    ggplot2.coord_flip()
p.plot()

produces:

enter image description here

But if I add:

labels = robj.StrVector(tuple(["versicolor", "virginica", "setosa"]))
variable_i = r_melted.names.index("Species")
r_melted[variable_i] = robj.FactorVector(r_melted[variable_i],
                                         levels=labels)

prior to plotting, I get:

enter image description here

I think this is because the names I use don't match exactly the Species name values. It would be helpful if rpy2 raised an error when this happens. But in any case, what if I want to overwrite the names of the factor? I.e. take the first factor name and make it x, the second y, etc. and have it be displayed in that order? Is the only way to do that to make a new column for it with the correct name in the dataframe?

Was it helpful?

Solution

You need to change the levels of the factor used, either on-the-fly (first example below), or in column for the data frame (second example).

If labels is a relatively short list the following will just work:

# r_melted is the one defined upstream of your code snippet,
# not the results of calling r.transform()
labels = robj.StrVector(tuple(["a", "b", "c"]))
p = ggplot2.ggplot(r_melted) + \
    ggplot2.geom_boxplot(aes_string(**{"x": "factor(variable, levels = %s)" % labels,
                                       "y": "value"
                                       "fill": "group"})) + \
    ggplot2.scale_fill_manual(values=np.array(["#00BA38", "#F8766D"])) + \
    ggplot2.coord_flip()

If labels is larger (or no R code at all is wished):

# r_melted is the one defined upstream of your code snippet,
# not the results of calling r.transform()
from rpy2.robjects.vectors import FactorVector
variable_i = r_melted.names.index('variable')
r_melted[variable_i] = FactorVector(r_melted[variable_i],
                                    levels = robj.StrVector(tuple(["a", "b", "c"]))
p = ggplot2.ggplot(r_melted) + \
    ggplot2.geom_boxplot(aes_string(**{"x": "variable",
                                       "y": "value"
                                       "fill": "group"})) + \
    ggplot2.scale_fill_manual(values=np.array(["#00BA38", "#F8766D"])) + \
    ggplot2.coord_flip()
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top