Frage

Suppose two ffdf files:

library(ff)
ff1 <- as.ffdf(data.frame(matrix(rnorm(10*10),ncol=10)))
ff2 <- ff1
colnames(ff2) <- 1:10

How can I column bind these without loading them into memory? cbind doesn't work.

There is the same question http://stackoverflow.com/questions/18355686/columnbind-ff-data-frames-in-r but it does not have an MWE and the author abandoned it so I reposted.

War es hilfreich?

Lösung

You can use the following construct cbind.ffdf2, making sure the column names of the two input ffdf's are not duplicate:

library(ff)
ff1 <- as.ffdf(data.frame(letA = letters[1:5], numA = 1:5))
ff2 <- as.ffdf(data.frame(letB = letters[6:10], numB = 6:10))

cbind.ffdf2 <- function(d1, d2){
  D1names <- colnames(d1)
  D2names <- colnames(d2)
  mergeCall <- do.call("ffdf", c(physical(d1), physical(d2)))
  colnames(mergeCall) <- c(D1names, D2names)
  mergeCall
}

cbind.ffdf2(ff1, ff2)[,]

Result:

   letA numA letB numB
1   a    1    f     6
2   b    2    g     7
3   c    3    h     8
4   d    4    i     9
5   e    5    j    10

Andere Tipps

Sorry for joining this late.If you want to cbind an arbitrary number of ffdf objects without worrying of duplicate columns. You can try this (building on Audrey's solution).

ff1 <- as.ffdf(data.frame(letA = letters[1:5], numA = 1:5))
ff2 <- as.ffdf(data.frame(letA = letters[6:10], numB = 6:10))

cbind.ffdf2 <- function(...){
  argl <- list(...)
  if(length(argl) == 1L){
    return(argl[[1]])
  }else{
    physicalList = NULL
    for(i in 1:length(argl)){
      if(class(argl[[i]]) == "data.frame"){
        physicalList = c(physicalList, physical(as.ffdf(argl[[i]])))
      }else{
        physicalList = c(physicalList, physical(argl[[i]]))
      }

    }
    mergeCall <- do.call("ffdf", physicalList)
    return(mergeCall)
  }

}

cbind.ffdf2(ff1, ff2)

It also coarses any data frame object in the list to an ffdf object.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top