You stated you have 'some million' rows so here is an excerpt of benchmarks with 3 columns of 10 million rows...
R 3.0.3 (on 32bit Celeron system w/ 2GB memory)
## Unit: microseconds
## expr min lq median uq max neval
## dt.addC(dt) 35.38 56.03 64.82 67.77 185.2 100
## df.add(df) 181.43 214.80 221.42 229.81 366.6 100
## dt.addB(dt) 2359.54 2457.09 2513.11 2577.00 6398.0 100
## dt.addA(dt) 2913.74 2995.64 3047.29 3125.82 6791.1 100
R 3.1.0 (on 64bit Haswell i7 w/ 24 GB memory)
## Unit: microseconds
## expr min lq median uq max neval
## df.add(df) 10.25 30.74 33.36 48.53 84.25 100
## dt.addC(dt) 27120.45 27563.79 27990.22 29642.46 87637.63 100
## dt.addB(dt) 38452.71 39018.90 46225.69 50142.46 130893.53 100
## dt.addA(dt) 193268.78 247749.71 251380.74 256380.43 440916.17 100
Note:
The difference between data.frame
and data.table
on 3.1.0 can be explained by the new way that R 3.1.0 handles assignments. Arun (one of the data.table
authors) does so in this chat log.
df.add
( a common base way to add columns to dat.frame
).
df$b <- b.vals
df$c <- c.vals
dt.addA
(the common base data.frame
method applied to data.table
)
dt$b <- b.vals
dt$c <- c.vals
dt.addB
(a common data.table
way to add columns)
dt[,`:=`(b=b.vals, c=c.vals)]
dt.addC
(another data.table
method of setting values [from Arun] )
## to reduce the overhead due to `[.data.table` on small data.frames.
set(dt, j="b", value=b.vals)
set(dt, j="c", value=c.vals)
Benchmarks for other data set sizes
R 3.1.0 on i7 System
# Test @ 1,000
## Unit: microseconds
## expr min lq median uq max neval
## dt.addC(dt) 6.007 10.38 11.71 12.50 20.79 100
## df.add(df) 11.534 19.49 20.57 21.32 940.63 100
## dt.addB(dt) 326.166 344.85 351.43 365.47 1412.86 100
## dt.addA(dt) 798.777 850.47 867.60 888.23 1935.20 100
## test relative
## 1 df.add(df) 1
## 4 dt.addC(dt) 1
## 3 dt.addB(dt) 35
## 2 dt.addA(dt) 87
# Test @ 10,000
## Unit: microseconds
## expr min lq median uq max neval
## dt.addC(dt) 11.13 17.88 19.20 20.80 988.9 100
## df.add(df) 10.97 20.56 22.65 24.94 41.1 100
## dt.addB(dt) 333.17 364.15 389.87 419.08 1347.0 100
## dt.addA(dt) 823.99 875.88 897.10 1076.90 29233.1 100
## test relative
## 1 df.add(df) 1
## 4 dt.addC(dt) 1
## 3 dt.addB(dt) 19
## 2 dt.addA(dt) 50
# Test @ 10,000,000
## Unit: microseconds
## expr min lq median uq max neval
## df.add(df) 10.25 30.74 33.36 48.53 84.25 100
## dt.addC(dt) 27120.45 27563.79 27990.22 29642.46 87637.63 100
## dt.addB(dt) 38452.71 39018.90 46225.69 50142.46 130893.53 100
## dt.addA(dt) 193268.78 247749.71 251380.74 256380.43 440916.17 100
## test relative
## 1 df.add(df) 1
## 4 dt.addC(dt) 1536
## 3 dt.addB(dt) 2213
## 2 dt.addA(dt) 11667
R 3.0.3 on Celeron System
# Test @ 1,000
## Unit: microseconds
## expr min lq median uq max neval
## dt.addC(dt) 55.78 82.58 94.48 96.14 176.1 100
## df.add(df) 182.65 215.36 220.10 225.03 361.6 100
## dt.addB(dt) 2699.10 2774.61 2827.34 2894.23 3442.2 100
## dt.addA(dt) 5259.89 6066.00 6122.37 6231.50 10265.9 100
## test relative
## 4 dt.addC(dt) 1.000
## 1 df.add(df) 2.889
## 3 dt.addB(dt) 32.444
## 6 dfadd2dtB(dt) 69.667
## 2 dt.addA(dt) 69.889
## 5 dfadd2dtA(dt) 96.000
# Test @ 10,000
## Unit: microseconds
## expr min lq median uq max neval
## dt.addC(dt) 134.0 162.8 168.7 185.8 4135 100
## df.add(df) 576.7 616.4 633.7 663.2 72749 100
## dt.addB(dt) 2789.8 2932.6 2993.0 3054.7 6702 100
## dt.addA(dt) 5400.6 6701.5 6819.0 10079.2 11518 100
## test relative
## 4 dt.addC(dt) 1.000
## 1 df.add(df) 8.143
## 3 dt.addB(dt) 14.619
## 2 dt.addA(dt) 34.286
## 6 dfadd2dtB(dt) 34.381
## 5 dfadd2dtA(dt) 53.810
# Test @ 10,000,000
## Unit: milliseconds
## expr min lq median uq max neval
## dt.addC(dt) 121.1 146.2 147.2 161.8 303.8 100
## dt.addB(dt) 197.7 225.4 228.0 270.2 380.7 100
## df.add(df) 767.8 823.5 857.0 938.2 1156.9 100
## dt.addA(dt) 709.6 1071.9 1112.6 1170.1 1343.9 100
## test relative
## 4 dt.addC(dt) 1.000
## 3 dt.addB(dt) 1.566
## 1 df.add(df) 6.172
## 2 dt.addA(dt) 7.594
```
System/Session Info...
Intel® Core™ i7-4700MQ Processor
24 GB
## R version 3.1.0 (2014-04-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rbenchmark_1.0.0 microbenchmark_1.3-0 data.table_1.9.2
##
## "Linux" "3.11.0-19-generic" "x86_64"
Intel(R) Celeron(R) CPU 2.53GHz
2 GB
## R version 3.0.3 (2014-03-06)
## Platform: i686-pc-linux-gnu (32-bit)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rbenchmark_1.0.0 microbenchmark_1.3-0 data.table_1.9.2
## [4] knitr_1.5
##
## "Linux" "3.2.0-60-generic-pae" "i686"