R: loop / function to create a matrix for comparison (contrasts)

https://stackoverflow.com/questions/8027691

22-02-2021
|

Frage

I have following type of data, means combination of factors

P1 <- c("a", "a", "a", "a", "b", "b", "b", "c", "c", "d")
P2 <- c("a", "b", "c", "d", "b", "c", "d", "c", "d", "d")
myfactors <- data.frame(P1, P2)

   P1 P2
1   a  a
2   a  b
3   a  c
4   a  d
5   b  b
6   b  c
7   b  d
8   c  c
9   c  d
10  d  d

In real word the factors might be any number, I am trying write a function that can be applicable to any level of the factors. I want to set contrasts all combinations available in the data set. for example in this data set a-b, a-c,a-d, b-c,b-d, c-d. The contrast rule here.

for example for "a-b" is if P1 = P2 = a or b the coefficient = -1, 
if P1=a, P2= b or P1= b, P2 = a, the coefficient = 2,
   else coefficient = 0

The output coefficient matrix will like the following:

P1  P2  a-b a-c a-d b-c b-d c-d
a   a   -1  -1  -1  0   0   0
a   b   2   0   0   0   0   0
a   c   0   2   0   0   0   0
a   d   0   0   2   0   0   0
b   b   1   0   0   -1  -1  0
b   c   0   0   0   2   0   0
b   d   0   0   0   0   2   0
c   c   0   1   0   0   0   -1
c   d   0   0   0   -1  0   2
d   d   0   0   -1  0   -1  -1

As the function I am thinking is flexible one, if I will apply to the following dataset,

P1 <- c("CI", "CI", "CI", "CD", "CD", "CK", "CK")
P2 <- c("CI", "CD", "CK", "CD", "CK", "CK", "CI")
 mydf2 <- data.frame(P1, P2)
 mydf2
  P1 P2
1 CI CI
2 CI CD
3 CI CK
4 CD CD
5 CD CK
6 CK CK
7 CK CI

The expected coefficient matrix for this dataframe is:

P1  P2  CI-CD    CI-CK  CD-CK   CK-CI
CI  CI    -1      -1      0   -1
CI  CD     2       0      0    0
CI  CK     0       2      0    0
CD  CD    -1       0     -1    0
CD  CK     0       0      2    0
CK  CK     0      -1     -1   -1
CK  CI     0       0      0    2

I tried several ways but could not come to successful program.

EDITS:

(1) I am not testing all possible combinations, the combination that only appear in P1 and P2 are tested

(2) I intend to develop solution not only to this instance, but of general application. for example myfactors dataframe above.

Lösung

You didn't supply a reason for your particular choice of the 6 ordered combinations of P1 and P2 values, so I just ran through them all:

combos <- cbind( combn(unique(c(P2, P1)), 2), combn(unique(c(P2, P1)), 2)[2:1, ])
combos
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] "CI" "CI" "CD" "CD" "CK" "CK"
[2,] "CD" "CK" "CK" "CI" "CI" "CD"

As I went through the logic it seemed more compact to test for conditions 1) and 2) and just use Boolean math to return the results. If both conditins are untrue you get 0. I've check the entries that do not match yours and I think your construction was wrong in spots. You have 0 in the "CI-CK" row 7 and I think the answer by your rules should be 2.:

sapply(1:ncol(combos), function(x) with( mydf2,  
      2*( (P1==combos[1,x] & P2 == combos[2,x]) | (P2==combos[1,x] & P1 == combos[2,x])) - 
       (P1 == P2 & P1 %in% combos[,x]) ) )
#---------------
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]   -1   -1    0   -1   -1    0
[2,]    2    0    0    2    0    0
[3,]    0    2    0    0    2    0
[4,]   -1    0   -1   -1    0   -1
[5,]    0    0    2    0    0    2
[6,]    0   -1   -1    0   -1   -1
[7,]    0    2    0    0    2    0

#------------------
 mydf2[ , 3:8] <- sapply(1:ncol(combos), function(x) with( mydf2,  
      2*( (P1==combos[1,x] & P2 == combos[2,x]) | (P2==combos[1,x] & P1 == combos[2,x])) - 
       (P1 == P2 & P1 %in% combos[,x]) ) )
 mydf2
 #-----------------
  P1 P2 CI-CD CI-CK CD-CK CD-CI CK-CI CK-CD
1 CI CI    -1    -1     0    -1    -1     0
2 CI CD     2     0     0     2     0     0
3 CI CK     0     2     0     0     2     0
4 CD CD    -1     0    -1    -1     0    -1
5 CD CK     0     0     2     0     0     2
6 CK CK     0    -1    -1     0    -1    -1
7 CK CI     0     2     0     0     2     0

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow