I will change your matrix definition a bit to make "NA"
characters into actual missing values (NA
) which have a special meaning in R that is close to the behavior you want.
mat <- matrix("", 10, 12)
mat[c(1, 4, 6),] <- sample(c("AA", "AB", "BB"), 18, TRUE)
mat[c(2, 3, 10),] <- sample(c("AA", "BB", "AB"), 18, TRUE)
mat[c(5, 8),] <- sample(c("BB", "AB", "BB"), 12, TRUE)
mat[c(7, 9),] <- sample(c("AA", "AA", "BB"), 12, TRUE)
mat[3,4] <- NA
mat[2,5] <- NA
You also have not provided with the values of all possible matches, so I am going to make some assumptions. These values can be changed without breaking the code.
For step 1, I am going to make a named vector that can be indexed using the pair names bunched together. So AA vs BA becomes 'AABA'
.
pair <- c('AAAA', 'AAAB', 'AABB', 'ABAB', 'ABBB', 'BBBB')
value <- c(1, 0.5, 0, 0.5, 0.5, 1)
# add reverse pairing (I am assuming symmetry)
pair <- c(pair, paste0(substr(pair, 3, 4), substr(pair, 1, 2)))
value <- c(value, value)
names(value) <- pair
Check how the vector value
looks at this point to make sure it's what you want. Next we define a function that uses this globally defined vector and returns what you want at the end of step 4. You may want to include the vector definition in the function body, but I feel that would not be efficient.
compare <- function(row1, row2){
# get total value of match from 2 vectors
# get vector of complete cases (not having any NAs)
good.cases <- complete.cases(cbind(row1, row2))
na.cases <- length(row1) - good.cases
total.value <- sum(value[paste0(row1, row2)], na.rm=TRUE) + 0.5*na.cases
total.value/good.cases
}
At this point I get total.value
of 6.5 from comparing the first 2 rows, but that is probably due to a wrong assumption in value
.
For step 5, we use a double loop:
# start empty matrix with match values
n <- nrow(mat)
matches <- matrix(rep(NA, n*n), nrow=n)
for (i in 1:n){
for (j in i:n){ ## if symmetric, only half matrix is enough
matches[i, j] <- compare(mat[i, ], mat[j, ])
}
}
I hope that helps.
Edit: Changed compare()
to assign a value to NA cases after request in the comments.