I'm trying to translate the following code into Rpy2 with no success:
neworder <- c("virginica","setosa","versicolor")
library("plyr")
iris2 <- arrange(transform(iris,
Species=factor(Species,levels=neworder)),Species)
This is meant to just change the factor
order of a particular column, in this case Species
.
I don't want to use plyr
and all that stuff in Rpy2 too since I can just modify the the dataframe plotted as a Python object. The following does not work:
# start with Python df 'mydf' and convert to R df
# to get mydf_r. The column equivalent of Species here
# is "variable"
# ...
mydf_r.variable = r.factor(ro.StrVector(["a", "b", "c"]))
# call ggplot...
ggplot2.ggplot(mydf) + ...
This does not work. How can I get the equivalent of the R code? I.e. I have a melted dataframe with several values of variable
plotted as c, b, a
and I want to change the order to be a, b, c
by changing the factor
order of variable
. Thanks.
edit I was able to change the order with this code:
labels = robj.StrVector(tuple(["a", "b", "c"]))
variable_factor = r.factor(labels, levels=labels)
r_melted = r.transform(r_melted, **{"variable": variable_factor})
p = ggplot2.ggplot(r_melted) + \
ggplot2.geom_boxplot(aes_string(**{"x": "variable",
"y": "value"
"fill": "group"})) + \
ggplot2.scale_fill_manual(values=np.array(["#00BA38", "#F8766D"])) + \
ggplot2.coord_flip()
However, this breaks ggplot's ability to correctly make the boxplot and color code it by group
variable. If I remove the lines:
labels = robj.StrVector(tuple(["a", "b", "c"]))
variable_factor = r.factor(labels, levels=labels)
r_melted = r.transform(r_melted, **{"variable": variable_factor})
Then it all works correctly... all I want is to change the order in which the variable
values appear in the boxplot.
@lgautier: the solution you gave looks like what I want, but it does not work for me here. I made a test case for it with the iris
dataset:
original plot
import os
iris = pandas.read_table(os.path.expanduser("~/iris.csv"),
sep=",")
iris["Species"] = iris["Name"]
r_melted = conversion_pydataframe(iris)
p = ggplot2.ggplot(r_melted) + \
ggplot2.geom_boxplot(aes_string(**{"x": "PetalLength",
"y": "PetalWidth",
"fill": "Species"})) + \
ggplot2.facet_grid(Formula("Species ~ .")) + \
ggplot2.coord_flip()
p.plot()
produces:
But if I add:
labels = robj.StrVector(tuple(["versicolor", "virginica", "setosa"]))
variable_i = r_melted.names.index("Species")
r_melted[variable_i] = robj.FactorVector(r_melted[variable_i],
levels=labels)
prior to plotting, I get:
I think this is because the names I use don't match exactly the Species
name values. It would be helpful if rpy2 raised an error when this happens. But in any case, what if I want to overwrite the names of the factor? I.e. take the first factor name and make it x
, the second y
, etc. and have it be displayed in that order? Is the only way to do that to make a new column for it with the correct name in the dataframe?