Question

Suppose two ffdf files:

library(ff)
ff1 <- as.ffdf(data.frame(matrix(rnorm(10*10),ncol=10)))
ff2 <- ff1
colnames(ff2) <- 1:10

How can I column bind these without loading them into memory? cbind doesn't work.

There is the same question http://stackoverflow.com/questions/18355686/columnbind-ff-data-frames-in-r but it does not have an MWE and the author abandoned it so I reposted.

Était-ce utile?

La solution

You can use the following construct cbind.ffdf2, making sure the column names of the two input ffdf's are not duplicate:

library(ff)
ff1 <- as.ffdf(data.frame(letA = letters[1:5], numA = 1:5))
ff2 <- as.ffdf(data.frame(letB = letters[6:10], numB = 6:10))

cbind.ffdf2 <- function(d1, d2){
  D1names <- colnames(d1)
  D2names <- colnames(d2)
  mergeCall <- do.call("ffdf", c(physical(d1), physical(d2)))
  colnames(mergeCall) <- c(D1names, D2names)
  mergeCall
}

cbind.ffdf2(ff1, ff2)[,]

Result:

   letA numA letB numB
1   a    1    f     6
2   b    2    g     7
3   c    3    h     8
4   d    4    i     9
5   e    5    j    10

Autres conseils

Sorry for joining this late.If you want to cbind an arbitrary number of ffdf objects without worrying of duplicate columns. You can try this (building on Audrey's solution).

ff1 <- as.ffdf(data.frame(letA = letters[1:5], numA = 1:5))
ff2 <- as.ffdf(data.frame(letA = letters[6:10], numB = 6:10))

cbind.ffdf2 <- function(...){
  argl <- list(...)
  if(length(argl) == 1L){
    return(argl[[1]])
  }else{
    physicalList = NULL
    for(i in 1:length(argl)){
      if(class(argl[[i]]) == "data.frame"){
        physicalList = c(physicalList, physical(as.ffdf(argl[[i]])))
      }else{
        physicalList = c(physicalList, physical(argl[[i]]))
      }

    }
    mergeCall <- do.call("ffdf", physicalList)
    return(mergeCall)
  }

}

cbind.ffdf2(ff1, ff2)

It also coarses any data frame object in the list to an ffdf object.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top