Pregunta

I have a data frame (451 obs of 8 variables) that has two columns (6&7) that look like this:

  Major      Minor
  C:726      T:2
  A:687      G:41
  T:3        C:725

I want to create one column that summarises this. To do this, I don't care about the letters in each cell, but I want the larger number to remain, whatever row it's in. i.e. I want it to look like this:

  Summary_column
  726
  687
  725

Not necessary, but for those that wonder what Im doing, this is the output from a programme called VCFtools; it has a count function that counts alleles in a VCF, but sometimes it names the allele as "Minor" when it is clearly more common.

Thanks for your help!

¿Fue útil?

Solución

I would do something like this :

extract <- function(v) {
  gsub("^.*:", "", v)
}
within(d, Summary_column <- pmax(extract(Major), extract(Minor)))

Which gives :

  Major Minor Summary_column
1 C:726   T:2            726
2 A:687  G:41            687
3   T:3 C:725            725
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top