Question

I have a data file that is several million lines long, and contains information from many groups. Below is an abbreviated section:

MARKER      GROUP1_A1   GROUP1_A2   GROUP1_FREQ GROUP1_N    GROUP2_A1   GROUP2_A2   GROUP2_FREQ GROUP2_N
rs10    A   C   0.055   1232    A   C   0.055   3221
rs1000  A   G   0.208   1232    A   G   0.208   3221
rs10000 G   C   0.134   1232    C   G   0.8624  3221
rs10001 C   A   0.229   1232    A   C   0.775   3221

I would like to created a weighted average of the frequency (FREQ) variable (which in itself is straightforward), however in this case some of the rows are mismatched (rows 3 & 4). If the letters do not line up, then the frequency of the second group needs to be subtracted by 1 before the weighted mean of that marker is calculated.

I would like to set up a simple IF statement, but I am unsure of the syntax of such a task.

Any insight or direction is appreciated!

Was it helpful?

Solution

Say you've read your data in a data frame called mydata. Then do the following:

mydata$GROUP2_FREQ <- mydata$GROUP2_FREQ - (mydata$GROUP1_A1 != mydata$GROUP2_A1)

It works because R treats TRUE values as 1 and FALSE values as 0.

EDIT: Try the following instead:

mydata$GROUP2_FREQ <- abs( (as.character(mydata$GROUP1_A1) != 
                            as.character(mydata$GROUP2_A1)) -                   
                          as.numeric(mydata$GROUP2_FREQ) )
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top