Question

I have 2 data frames df1 and df2. df1 and df2 have the same size (rows and columns) and same factors. Say:

df1 <- data.frame(a=c('alpha','beta','gamma'), b=c(1,2,3), c=c('x','y','z'), d=c(4,5,6))

      a b c d
1 alpha 1 x 4
2  beta 2 y 5
3 gamma 3 z 6

and

df2 <- data.frame(a=c('alpha','beta','gamma'), b=c(7,8,9), c=c('x','y','z'), d=c(10,11,12))

      a b c  d
1 alpha 7 x 10
2  beta 8 y 11
3 gamma 9 z 12

I would like to multiply these 2 dataframes and get a result like tyhis:

      a b  c d
1 alpha 7  x 40
2  beta 16 y 55
3 gamma 27 z 72

I have done some search and attempted the following code:

M <- merge(df1,df2,by=c('a','c'))
S <- M[,grepl("*\\.x$",names(M))] * M[,grepl("*\\.y$",names(M))]
cbind(M[,c('a','c'),drop=FALSE],S)

this code works fine and gives the following:

      a c b.x d.x
1 alpha x   7  40
2  beta y  16  55
3 gamma z  27  72

Question: Is there a better way to achieve this multiplication ? Keep in mind that my dataframes have same number of rows, columns and factor names. My real life dataframes are much larger, both rows and columns.

Was it helpful?

Solution

Something like this maybe?:

data.frame(
 Map(function(x,y) if(all(is.numeric(x),is.numeric(y))) x * y else x, df1, df2)
)

#      a  b c  d
#1 alpha  7 x 40
#2  beta 16 y 55
#3 gamma 27 z 72

Some benchmarking:

smp <- sample(1:4,50000,replace=TRUE)
df1big <- df1[,smp]
df2big <- df2[,smp]

lmfun <- function() {
 out <- data.frame(
 Map(function(x,y) if(all(is.numeric(x),is.numeric(y))) x * y else x,
     df1big, df2big)
)
}
johnfun <- function() {
  sel <- sapply(df1big, is.numeric)
  df1big[,sel] <- df1big[,sel] * df2big[,sel]
}

system.time(lmfun())
#   user  system elapsed 
#   6.06    0.00    6.07 
system.time(johnfun())
#   user  system elapsed 
#  24.91    0.00   24.99

OTHER TIPS

Assuming the columns in each DF match, you could simply select the numeric ones and then multiply them. This method minimizes the amount of non-vectorized R as much as possible.

sel <- sapply(df1, is.numeric)
df1[,sel] <- df1[,sel] * df2[,sel]

You could make a copy of df1 first so that you keep that.

If you have potential unmatched numeric columns it's relatively easy to adjust it.

sel <- sapply(df1, is.numeric) & sapply(df2, is.numeric)
df1[,sel] <- df1[,sel] * df2[,sel]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top