Question

I would like to solve this problem:

I wanna to convert that format:

Samples Genotype  Region 
sample1    A      Region1
sample1    B      Region2
sample1    A      Region3
sample2    A      Region2
sample2    B      Region3
sample3    B      Region1
sample3    A      Region3

To a genotype matrix, including a tag to missing genotypes:

Samples   Region1 Region2 Region3
sample1     A       B      A
sample2     X       A      B
sample3     B       X      A

It is possible to do in R software? Thanks a lot.

Was it helpful?

Solution

Your data:

dat <- read.table(text = "Samples Genotype  Region 
sample1    A      Region1
sample1    B      Region2
sample1    A      Region3
sample2    A      Region2
sample2    B      Region3
sample3    B      Region1
sample3    A      Region3", header = TRUE)

You can use the reshape2 package.

library(reshape2)
dat2 <- dcast(dat, Samples ~ Region, value.var = "Genotype")

In the result, missing values are indicated by NA:

#   Samples Region1 Region2 Region3
# 1 sample1       A       B       A
# 2 sample2    <NA>       A       B
# 3 sample3       B    <NA>       A

NAs are appropriate to represent missing data. But you can replace the NAs by Xs with the following command:

dat2[is.na(dat2)] <- "X"

#   Samples Region1 Region2 Region3
# 1 sample1       A       B       A
# 2 sample2       X       A       B
# 3 sample3       B       X       A

OTHER TIPS

Here's the "base R" reshape equivalent of Sven's answer (+1, Sven):

reshape(dat, direction = "wide", idvar = "Samples", timevar="Region")
#   Samples Genotype.Region1 Genotype.Region2 Genotype.Region3
# 1 sample1                A                B                A
# 4 sample2             <NA>                A                B
# 6 sample3                B             <NA>                A

Replace the NA in the same manner if necessary.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top