Use array result as multiplier for the original data frame

https://stackoverflow.com/questions/8583480

23-03-2021
|

Question

for a given data frame I would like to multiply values of an array to a column of the data frame. The data frame consists of rows, containing a name, a numerical value and two factor values:

name credit gender group
n1 10 m A
n2 20 f B
n3 30 m A
n4 40 m B
n5 50 f C

This data frame can be generated using the commands:

name    <- c('n1','n2','n3','n4','n5')
credit  <- c(10,20,30,40,50)
gender  <- c('m','f','m','m','f')
group   <- c('A','B','A','B','C')
DF      <-data.frame(cbind(name,credit,gender,group))
# binds columns together and uses it as a data frame

Additionally we have a matrix derived from the data frame (in more complex cases this will be an array). This matrix contains the sum value of all contracts that fall into a particular category (characterized by m/f and A/B/C):

   m f
A 40 NA
B 40 20
C NA 50

The goal is to multiply the values in DF$credit by using the corresponding value assigned to each category in the matrix, e.g. the value 10 of the first row in DF would be multiplied by 40 (the category defined by m and A).

The result would look like:

name credit gender group result
n1 10 m A 400
n2 20 f B 400
n3 30 m A 1200
n4 40 m B 1600
n5 50 f C 2500

If possible, I would like to perform this using the R base package but I am open for any helpful solutions that work nicely.

Solution

You can construct a set of indices into derived (being your derived matrix) by making an index matrix out of DF$group and DF$gender. The reason the as.character is there is because DF$group and DF$gender are factors, whereas I just want character indices.

>idx = matrix( c(as.character(DF$group),as.character(DF$gender)),ncol=2)
>idx
[,1] [,2]
[1,] "A"  "m" 
[2,] "B"  "f" 
[3,] "A"  "m" 
[4,] "B"  "m" 
[5,] "C"  "f" 
>DF$result = DF$credit * derived[idx]

Note with that last line, using the code you have above to generate DF, your numeric columns turn out as factors (ie DF$credit is a factor). In that case you need to do as.numeric(DF$credit)*derived[idx]. However, I imagine that in your actual data your data frame doesn't have DF$credit as a factor but instead as a numeric.

OTHER TIPS

When you create the data.frame object, don't use cbind, it's not necessary and it forces the credit variable to become a factor.

Just use DF <- data.frame(name, credit, gender, group)

Then run a for loop that goes through each row in your data.frame object.

n <- length(DF$credit)
result <- rep(0, n)
for(i in 1:n) {
  result[i] <- DF$credit[i] * sum(DF$credit[DF$gender==DF$gender[i] & DF$group==DF$group[i]])
}

Replace your data.frame object with this new one that includes your results.

DF <- data.frame(name, credit, gender, group, result)

I recommend the plyr package, but you can do this using the base by function:

> by(DF, DF['name'], function (row) row$credit * m[as.character(row$group), as.character(row$gender)])
name: n1
[1] 400
--------------------------------------------------------------------- 
name: n2
[1] 400
--------------------------------------------------------------------- 
name: n3
[1] 1200
--------------------------------------------------------------------- 
name: n4
[1] 1600
--------------------------------------------------------------------- 
name: n5
[1] 2500

plyr can give you the result as a data frame which is nice:

> ddply(DF, .(name), function (row) row$credit * m[as.character(row$group), as.character(row$gender)])
  name   V1
1   n1  400
2   n2  400
3   n3 1200
4   n4 1600
5   n5 2500

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow