Divide only one column of nested data.frames by an integer value
Question
I wish to divide the third column of a dataframe by 5. These dataframes are nested and look like this:
[[44]]
Ethnicity Variant Sum
1: ASW ACCEPTOR 1
2: ASW CDS 68
3: ASW CGA_CNVWIN 1000
4: ASW CGA_MIRB 0
5: ASW DELETE 0
6: ASW DISRUPT 0
7: ASW DONOR 0
8: ASW FRAMESHIFT 0
9: ASW INSERT 1
10: ASW INTRON 54
I have used three commands each of which is successful but has off-target effects.
lapply(ASWldtSUM,function(x)(x/5))
returns
[[44]]
Ethnicity Variant Sum
1: NA NA 0.2
2: NA NA 13.6
3: NA NA 200.0
4: NA NA 0.0
5: NA NA 0.0
which has the unfortunate effect of dividing ALL rows by 5, leading to issues when the class is not integer as in the $Sum column.
lapply(ASWldtSUM,function(x[,3])(x/5))
has the effect of returning only a single vector, which would work nicely if this were not a nested array of dataframes, but the statement
ASWdtSUM$NEWCOL<-lapply(ASWldtSUM,function(x[,3])(x/5))
Cannot simply be written because it is nested.
Using rapply as in the following statement:
rapply(ASWldtSUM,function(x) if (is.integer(x)) {(x/5)})
leads to a disordering of the results.
So, is there a simple way to either append a 4th column to each nested DataFrame, or to replace the third column of each DF (Sum) with that value divided by 5?
Solution
It is very simple, if ASWldtSUM
is the name of the list containing the data frames, then you can do:
lapply(ASWldtSUM,FUN=function(x) { x[,3]=x[,3]/5; return(x) })
Basically you are replacing the (entire) third column with the division of the (entire) third colum by five.
In practice:
> ASWldtSUM1=data.frame(Ethnicity=rep('ASW',10),Variant=c("ACCEPTOR","CDS","CGA_CNVWIN","CGA_MIRB","DELETE","DISRUPT","DONOR","FRAMESHIFT","INSERT","INTRON"), Sum=c(1,68,1000,0,0,0,0,0,1,54))
> #created a first data.frame (equal to your example)
> ASWldtSUM2=data.frame(Ethnicity=rep('ASW',10),Variant=c("ACCEPTOR","CDS","CGA_CNVWIN","CGA_MIRB","DELETE","DISRUPT","DONOR","FRAMESHIFT","INSERT","INTRON"), Sum=c(1,2,3,4,5,6,7,8,9,10))
> #created a second data.frame (with different values for the third column)
> ASWldtSUM=list(ASWldtSUM1,ASWldtSUM2)
> #created a list of data frames
> lapply(ASWldtSUM,FUN=function(x) { x[,3]=x[,3]/5; return(x) })
> #apply the function to divide third column to each nested data.frame
[[1]]
Ethnicity Variant Sum
1 ASW ACCEPTOR 0.2
2 ASW CDS 13.6
3 ASW CGA_CNVWIN 200.0
4 ASW CGA_MIRB 0.0
5 ASW DELETE 0.0
6 ASW DISRUPT 0.0
7 ASW DONOR 0.0
8 ASW FRAMESHIFT 0.0
9 ASW INSERT 0.2
10 ASW INTRON 10.8
[[2]]
Ethnicity Variant Sum
1 ASW ACCEPTOR 0.2
2 ASW CDS 0.4
3 ASW CGA_CNVWIN 0.6
4 ASW CGA_MIRB 0.8
5 ASW DELETE 1.0
6 ASW DISRUPT 1.2
7 ASW DONOR 1.4
8 ASW FRAMESHIFT 1.6
9 ASW INSERT 1.8
10 ASW INTRON 2.0
> #desired result
OTHER TIPS
There are many ways of doing this. Here is one:
Create some sample data:
dat <- lapply(1:3, function(x)data.frame(a=sample(letters, 4), b=sample(LETTERS, 4), z=rnorm(4)))
dat
[[1]]
a b z
1 r M 0.3054329
2 v I -0.8051859
3 t Q -1.6082701
4 u D -0.2315290
[[2]]
a b z
1 j W -0.4692469
2 f S 0.3112689
3 a D 0.4208704
4 w Z 0.6903139
[[3]]
....
Next, use a small anonymous function inside lapply()
. For better illustration, I multiply by 100 rather than divide by 5:
lapply(dat, function(x){x[3] <- x[3]*100; x})
[[1]]
a b z
1 r M 30.54329
2 v I -80.51859
3 t Q -160.82701
4 u D -23.15290
[[2]]
a b z
1 j W -46.92469
2 f S 31.12689
3 a D 42.08704
4 w Z 69.03139
[[3]]
....