Can I avoid using data frames in ggplot2?

https://stackoverflow.com/questions/2063821

r
ggplot2

20-09-2019
|

Question

I'm running a monte-carlo simulation and the output is in the form:

> d = data.frame(iter=seq(1, 2), k1 = c(0.2, 0.6), k2=c(0.3, 0.4))
> d
iter  k1   k2
1     0.2  0.3
2     0.6  0.4

The plots I want to generate are:

plot(d$iter, d$k1)
plot(density(d$k1))

I know how to do equivalent plots using ggplot2, convert to data frame

new_d = data.frame(iter=rep(d$iter, 2), 
                   k = c(d$k1, d$k2), 
                   label = rep(c('k1', 'k2'), each=2))

then plotting is easy. However the number of iterations can be very large and the number of k's can also be large. This means messing about with a very large data frame.

Is there anyway I can avoid creating this new data frame?

Thanks

Solution

Short answer is "no," you can't avoid creating a data frame. ggplot requires the data to be in a data frame. If you use qplot, you can give it separate vectors for x and y, but internally, it's still creating a data frame out of the parameters you pass in.

I agree with juba's suggestion -- learn to use the reshape function, or better yet the reshape package with melt/cast functions. Once you get fast with putting your data in long format, creating amazing ggplot graphs becomes one step closer!

OTHER TIPS

Yes, it is possible for you to avoid creating a data frame: just give an empty argument list to the base layer, ggplot(). Here is a complete example based on your code:

library(ggplot2)

d = data.frame(iter=seq(1, 2), k1 = c(0.2, 0.6), k2=c(0.3, 0.4))
# desired plots:
# plot(d$iter, d$k1)
# plot(density(d$k1))

ggplot() + geom_point(aes(x = d$iter, y = d$k1))
# there is not enough data for a good density plot,
# but this is how you would do it:
ggplot() + geom_density(aes(d$k1))

Note that although this allows for you not to create a data frame, a data frame might still be created internally. See, e.g., the following extract from ?geom_point:

All objects will be fortified to produce a data frame.

You can use the reshape function to transform your data frame to "long" format. May be it is a bit faster than your code ?

R> reshape(d, direction="long",varying=list(c("k1","k2")),v.names="k",times=c("k1","k2"))
     iter time   k id
1.k1    1   k1 0.2  1
2.k1    2   k1 0.6  2
1.k2    1   k2 0.3  1
2.k2    2   k2 0.4  2

So just to add to the previous answers. With qplot you could do

p <- qplot(y=d$k2, x=d$k1)

and then from there building it further, e.g. with

p + theme_bw()

But I agree - melt/cast is genereally the way forward.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow