Question

Here is my small dataset and here is a function:

dat <- data.frame (
 A1 = c("AA", "AA", "AA", "AA"),
 B1 = c("BB", "BB", "AB", "AB"), 
 C1 = c("AB", "BB", "AA", "AB"))

The function

syfun <- function (x, y){

if(x == "AA" & y == "AA" | x == "BB" & y == "BB"){
        sxy = 1
}
if(x == "AA" & y == "AB" | x == "AB" & y == "AA"){
    sxy = 0.5
}
if (x == "AA" & y == "BB"| x == "BB" & y == "AA"){
    sxy = 0
}
return(sxy)
}

out <- rep (NA, NROW(dat))

for (i in 1:NROW(dat)){
out[i] <- syfun (dat[i,1], dat[i,1])
}

mean(out)
1

Here what I am trying to do is apply the function with first column (variable A) with same variable (variable A1) and average the output value. I want to save this output to a cell of matrix.

Similarly between A1 and B1.

   for (i in 1:NROW(dat)){
    out[i] <- syfun (dat[i,1], dat[i,2])
    }
    mean(out)
    0.25

Now similar to correlation matrix, I want to save all possible combination between variable to make a matrix like.

         A1    B1    C1
A1       1.0  0.25  0.5
B1       0.25  1.0  NA
C1       0.5   NA   1.0

Edits: More complete function that do not produce NAs

syfun <- function (x, y){
  sxy <- NA
  if(x == "AA" & y == "AA" | x == "BB" & y == "BB"){
        sxy = 1
  }
  if(x == "AA" & y == "AB" | x == "AB" & y == "AA"){
        sxy = 0.5
  }
  if (x == "AA" & y == "BB"| x == "BB" & y == "AA"){
        sxy = 0
  }
  if (x == "BB" & y == "AB"| x == "AB" & y == "BB"){
        sxy = 0.5
  }

  if(x == "AB" & y ==  "AB") {
    sxy = 0.5
    }
  return(sxy)
}
Was it helpful?

Solution

First, your function syfun has to return NA if there is no match. Hence, I added a line at the top of the function:

syfun <- function (x, y){
  sxy <- NA
  if(x == "AA" & y == "AA" | x == "BB" & y == "AA"){
        sxy = 1
  }
  if(x == "AA" & y == "AB" | x == "AB" & y == "AA"){
        sxy = 0.5
  }
  if (x == "AA" & y == "BB"| x == "BB" & y == "AA"){
        sxy = 0
  }
  return(sxy)
}

Second, you can use outer to apply the function to all combinations. You need to use Vectorize to vectorize the function:

mat <- outer(names(dat), names(dat), function(x, y) 
  Vectorize(function(a, b) mean(Vectorize(syfun)(dat[[a]], dat[[b]])))(x,y))

Third, replace the elements on the diagonal with 1:

diag(mat) <- 1

Fourth, set row and column names:

dimnames(mat) <- list(names(dat), names(dat))

The result:

     A1   B1  C1
A1 1.00 0.25 0.5
B1 0.25 1.00  NA
C1 0.50   NA 1.0

OTHER TIPS

From your example it looks like you just want to know the proportion of As in one compared with the As in another to compute their similarity. If that is the case then: (I'm assuming these are genes?)

dat <- data.frame (
 A1 = c("AA", "AA", "AA", "AA"),
 B1 = c("BB", "BB", "AB", "AB"), 
 C1 = c("AB", "BB", "AA", "AB"))

## this function takes the columns from dat,  pastes all the genes together, then counts the number of each that appears. It then divides the smaller by the larger to give you a percent similar (only does it for 'A' right now, but I could expand that to more genes if necessary)

fun <-  function(x,y){
  x.prop <- table(unlist(strsplit(Reduce(paste0, x),'*')))
  y.prop <- table(unlist(strsplit(Reduce(paste0, y),'*')))
  ans <- ifelse(x.prop['A']>y.prop['A'], y.prop['A']/x.prop['A'], x.prop['A']/y.prop['A'])
  return(ans)
}

final_mat <- matrix(ncol=3,nrow=3) ## creates an empty final matrix
colnames(final_mat) <- colnames(dat)  
rownames(final_mat) <- colnames(dat)


### this applies 'fun' to each of the 2 combinations of column names
final_mat[upper.tri(final_mat)] <- apply(combn(colnames(dat),2),2,function(x) fun(dat[,x[1]], dat[,x[2]]))

final_mat[lower.tri(final_mat)] <- apply(combn(colnames(dat),2),2,function(x) fun(dat[,x[1]], dat[,x[2]]))

diag(final_mat) <- 1

final_mat
     A1   B1  C1
A1 1.00 0.25 0.5
B1 0.25 1.00 0.5
C1 0.50 0.50 1.0
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top