Question

I have a large set of scaling factors I wish to apply to a data frame, these factors are particular to the group a sample comes from and particular to each variable of the sample. I have tried to construct a minimal example for this question.

SCALING FACTORS

Batch A     B
Q     1.01  1.31
R     0.90  1.22
S     1.04  1.09

DATA

Batch A     B
Q     23    10
Q     22    11
R     27    12
R     26    13
S     22    14
S     24    15

so then, say, batch Q sample 1 would go from 23, 10 to 23.23, 13.1

I realise that there could be an apply somewhere in the solution to this but I am struggling to work out where to start. Any help much appreciated :-)

scaling_factors_example<-data.frame(Batch=c("Q","R","S"),A=c(1.01,0.9, 1.04), B=c(1.31, 1.22, 1.09))

data_example<-data.frame(Batch=c("Q","Q","R","R","S","S"), A=c(23,22,27,26,22,24), B=c(10,11,12,13,14,15))
Était-ce utile?

La solution 2

A riff on Mark's answer (borrowing his abbreviations), except it uses match instead of merge as that is often much faster for N-1 joins:

d[, -1] <- d[, -1] * s[match(d[, 1], s[, 1]), -1]

which produces

#   Batch     A     B
# 1     Q 23.23 13.10
# 2     Q 22.22 14.41
# 3     R 24.30 14.64
# 4     R 23.40 15.86
# 5     S 22.88 15.26
# 6     S 24.96 16.35

match finds the position of a value in the first vector, in the second vector, which effectively allows to do N-1 merges as is the case here. And as I noted, it's faster, which may matter if you have large tables you're joining:

library(microbenchmark)
microbenchmark(s[match(d[, 1], s[, 1]), -1])

# Unit: microseconds
#     min      lq   median      uq     max neval
# 167.854 173.706 176.6315 181.019 279.025   100

microbenchmark(merge(d[ ,1, drop=F], s, "Batch"))

# Unit: microseconds
#     min       lq   median       uq      max neval
# 983.353 1060.149 1068.195 1103.302 2181.004   100

Side note, if you have large tables, you should consider data.table for merges, as that can be even faster than match, under some circumstances.

Autres conseils

Its easier if you go the merge way instead of using the apply family, I think (s is scaling_factors_example, d is data_example)

m <- merge(d[ ,1, drop=F], s, "Batch")
d[-1] <- m[-1] * d[-1]
d

  Batch     A     B
1     Q 23.23 13.10
2     Q 22.22 14.41
3     R 24.30 14.64
4     R 23.40 15.86
5     S 22.88 15.26
6     S 24.96 16.35

Explanation

merge gives you a dataframe of the same size as your data containing the corresponding scaling factors for each entry. Now you can simply multiply the columns.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top