Functions for creating and reshaping big data in R using the FF package

Question 1

The function reshape does not explicitly exists for ffdf objects. But it is quite straightforward to execute with functionality from package ffbase. Just use ffdfdply from package ffbase, split by Subject and apply reshape inside the function.

An example on the Indometh dataset with 1000000 subjects.

require(ffbase)
require(datasets)
data(Indometh)

## Generate some random data
x <- expand.ffgrid(Subject = ff(factor(1:1000000)), time = ff(unique(Indometh$time)))
x$conc <- ffrandom(n=nrow(x), rfun = rnorm)
dim(x)
[1] 11000000        3

## and reshape to wide format
result <- ffdfdply(x=x, split=x$Subject, FUN=function(datawithseveralsplitelements){
  df <- reshape(datawithseveralsplitelements, 
              v.names = "conc", idvar = "Subject", timevar = "time", direction = "wide")
  as.data.frame(df)
})
class(result)
[1] "ffdf"
colnames(result)
[1] "Subject"   "conc.0.25" "conc.0.5"  "conc.0.75" "conc.1"    "conc.1.25" "conc.2"    "conc.3"    "conc.4"    "conc.5"    "conc.6"    "conc.8"   
dim(result)
[1] 1000000      12

Question 2

You would be hard put to construct a less efficient method than what you offer. Using rbind.data.frame is incredibly inefficient. Try this instead to create a six thousand line dataset for 6 subjects:

DF <- data.frame( Subj = rep( 1:6, each=1000), matrix(runif(6000*11), nrow=6000) )

Scaling it up to have a billion items (US billion, not UK billion) should give you about an 10GB object, so maybe trying 80 million lines or so?

I think asking for a tutorial in the ff-package is out-of-scope for SO. Please read the FAQ. Such questions are generally closed because the questioner demonstrates that they don't really know what they are talking about.