Although not exactly an answer to what user asks, but 5 mil rows are not really too much to read. Ofcourse base R's read.table
will be very slow but using fread
from data.table
package is fast enough. Here are the benchmarks
tbl <- read.table(header=T, stringsAsFactors=F, text='Date PI SSC GC
2/11/2013 0.52 0.89 4.2')
require(data.table)
#create hige datatable with 5mil rows to write to temp file
bigtbl <- rbindlist( lapply(1:(5*1e6), function(x) tbl))
write.table(bigtbl, row.names=F, quote=F, file="temp.txt")
#benchmark of reading 5 mil row file back using fread function
system.time(bigtbl2 <- fread('temp.txt'))
## Read 5000000 rows and 4 (of 4) columns from 0.116 GB file in 00:00:11
## user system elapsed
## 10.76 0.08 10.86
Ofcourse memory size may still be concern, but in this case it's only 153MB still
> tables()
NAME NROW MB COLS KEY
[1,] bigtbl2 5,000,000 153 Date,PI,SSC,GC
Total: 153MB
If you are going read this data frequently, it makes sense to save the data in standard RData file using save
function and read it back using load