I tried an example. For what it's worth, it agrees with the user's assertion that inserting rows into the data frame is also really slow. I don't quite understand what's going on, as I would have expected the allocation problem to trump the speed of copying. Can anyone either replicate this, or explain why the results below (rbind < appending < insertion) would be true in general, or explain why this is not a representative example (e.g. data frame too small)?
edit: the first time around I forgot to initialize the object in hell2fun
to a data frame, so the code was doing matrix operations rather than data frame operations, which are much faster. If I get a chance I'll extend the comparison to data frame vs. matrix. The qualitative assertions in the first paragraph hold, though.
N <- 1000
set.seed(101)
r <- matrix(runif(2*N),ncol=2)
## second circle of hell
hell2fun <- function() {
df <- as.data.frame(rbind(r[1,])) ## initialize
for (i in 2:N) {
df <- rbind(df,r[i,])
}
}
insertfun <- function() {
df <- data.frame(x=rep(NA,N),y=rep(NA,N))
for (i in 1:N) {
df[i,] <- r[i,]
}
}
rsplit <- as.list(as.data.frame(t(r)))
rbindfun <- function() {
do.call(rbind,rsplit)
}
library(rbenchmark)
benchmark(hell2fun(),insertfun(),rbindfun())
## test replications elapsed relative user.self
## 1 hell2fun() 100 32.439 484.164 31.778
## 2 insertfun() 100 45.486 678.896 42.978
## 3 rbindfun() 100 0.067 1.000 0.076