Question

I have a huge data frame. I grouped all my data based on two columns The problem that when I use lm function with ddply I get an error Error: cannot allocate vector of size 8.4 Mb . However, when I use it for another functions as mean it works perfectly. Could you suggest me something that fix this problem, perhaps another function instead of ddply? I have used the maximum limit by the way

    memory.limit(size=4000)

Here is an example:

                 a<- seq(1, 1000, 1)
                 b<- seq(2,1001,2)
                 c<- c(rep(1,250), rep(2, 250), rep(3,250), rep(4,250))
                 d<- c(rep(5,250), rep(6, 250), rep(7,250), rep(8,250))
                 df<-data.frame(a,b,c,d)

               dafr<-dlply( df, .(c,d ) , lm, formula= (a~b ))

What I have experienced converting data frame to data.table helps, but I do not know how to use lm in the data.table framework.

THanks for attention.

Was it helpful?

Solution

If you only need the coefficients, you can try this:

library(data.table)
setDT(df)
dafr <- df[, as.list(lm.fit(cbind(1, b), a)$coef), by=list(c, d)]
setnames(dafr, c("c", "d", "intercept", "slope"))
#   c d    intercept slope
#1: 1 5 1.869449e-13   0.5
#2: 2 6 5.176935e-13   0.5
#3: 3 7 5.000000e+02   0.5
#4: 4 8 5.000000e+02   0.5
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top