Use array result as multiplier for the original data frame
Question
for a given data frame I would like to multiply values of an array to a column of the data frame. The data frame consists of rows, containing a name, a numerical value and two factor values:
name credit gender group
n1 10 m A
n2 20 f B
n3 30 m A
n4 40 m B
n5 50 f C
This data frame can be generated using the commands:
name <- c('n1','n2','n3','n4','n5')
credit <- c(10,20,30,40,50)
gender <- c('m','f','m','m','f')
group <- c('A','B','A','B','C')
DF <-data.frame(cbind(name,credit,gender,group))
# binds columns together and uses it as a data frame
Additionally we have a matrix derived from the data frame (in more complex cases this will be an array). This matrix contains the sum value of all contracts that fall into a particular category (characterized by m/f and A/B/C):
m f
A 40 NA
B 40 20
C NA 50
The goal is to multiply the values in DF$credit by using the corresponding value assigned to each category in the matrix, e.g. the value 10 of the first row in DF would be multiplied by 40 (the category defined by m and A).
The result would look like:
name credit gender group result
n1 10 m A 400
n2 20 f B 400
n3 30 m A 1200
n4 40 m B 1600
n5 50 f C 2500
If possible, I would like to perform this using the R base package but I am open for any helpful solutions that work nicely.
Solution
You can construct a set of indices into derived
(being your derived matrix) by making an index matrix out of DF$group
and DF$gender
. The reason the as.character
is there is because DF$group
and DF$gender
are factors, whereas I just want character indices.
>idx = matrix( c(as.character(DF$group),as.character(DF$gender)),ncol=2)
>idx
[,1] [,2]
[1,] "A" "m"
[2,] "B" "f"
[3,] "A" "m"
[4,] "B" "m"
[5,] "C" "f"
>DF$result = DF$credit * derived[idx]
Note with that last line, using the code you have above to generate DF
, your numeric columns turn out as factors (ie DF$credit
is a factor). In that case you need to do as.numeric(DF$credit)*derived[idx]
. However, I imagine that in your actual data your data frame doesn't have DF$credit
as a factor but instead as a numeric.
OTHER TIPS
When you create the data.frame object, don't use cbind, it's not necessary and it forces the credit variable to become a factor.
Just use DF <- data.frame(name, credit, gender, group)
Then run a for loop that goes through each row in your data.frame object.
n <- length(DF$credit)
result <- rep(0, n)
for(i in 1:n) {
result[i] <- DF$credit[i] * sum(DF$credit[DF$gender==DF$gender[i] & DF$group==DF$group[i]])
}
Replace your data.frame object with this new one that includes your results.
DF <- data.frame(name, credit, gender, group, result)
I recommend the plyr
package, but you can do this using the base by
function:
> by(DF, DF['name'], function (row) row$credit * m[as.character(row$group), as.character(row$gender)])
name: n1
[1] 400
---------------------------------------------------------------------
name: n2
[1] 400
---------------------------------------------------------------------
name: n3
[1] 1200
---------------------------------------------------------------------
name: n4
[1] 1600
---------------------------------------------------------------------
name: n5
[1] 2500
plyr
can give you the result as a data frame which is nice:
> ddply(DF, .(name), function (row) row$credit * m[as.character(row$group), as.character(row$gender)])
name V1
1 n1 400
2 n2 400
3 n3 1200
4 n4 1600
5 n5 2500