An efficient way of converting a vector of characters into integers based on frequency in R

StackOverflow https://stackoverflow.com/questions/18044575

  •  23-06-2022
  •  | 
  •  

Question

I have a vector of characters consisting of only 'a' or 'g', I want to convert them to integers based on frequency, i.e. the more frequent one should be coded to 0, and the other to 1, for example:

set.seed(17)
x = sample(c('g', 'a'), 10, replace=T)
x
# [1] "g" "a" "g" "a" "g" "a" "g" "g" "a" "g"
x[x == names(which.max(table(x)))] = 0
x[x != 0] = 1
x
# [1] "0" "1" "0" "1" "0" "1" "0" "0" "1" "0"

This works, but I wonder if there is a more efficient way to do it.

(We don't have to consider the 50%-50% case here, because it should never happen in our study.)

Was it helpful?

Solution

Use this:

ag.encode <- function(x)
{
  result <- x == "a"
  if( sum(result) > length(result) %/% 2 ) 1-result else as.numeric(result)
}

If you want to keep the labels in a factor structure, use this instead:

ag.encode2factor <- function(x)
{
  result <- x == "a"
  if( sum(result) > length(result) %/% 2 )
  {
     factor(2-result, labels=c("a","g"))
  }
  else
  {
     factor(result+1, labels=c("g","a"))
  }
}

OTHER TIPS

You can convert your character vector to a factor one. This solution is more general in the sense you don't need to know the name of the 2 characters used to create x.

y <- as.integer(factor(x))-1
if(sum(y)>length(y)/2) y <- as.integer(!y)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top