Question

I wish to divide the third column of a dataframe by 5. These dataframes are nested and look like this:

[[44]]
    Ethnicity      Variant  Sum
 1:       ASW     ACCEPTOR    1
 2:       ASW          CDS   68
 3:       ASW   CGA_CNVWIN 1000
 4:       ASW     CGA_MIRB    0
 5:       ASW       DELETE    0
 6:       ASW      DISRUPT    0
 7:       ASW        DONOR    0
 8:       ASW   FRAMESHIFT    0
 9:       ASW       INSERT    1
10:       ASW       INTRON   54

I have used three commands each of which is successful but has off-target effects.

lapply(ASWldtSUM,function(x)(x/5))

returns

[[44]]
    Ethnicity Variant   Sum
 1:        NA      NA   0.2
 2:        NA      NA  13.6
 3:        NA      NA 200.0
 4:        NA      NA   0.0
 5:        NA      NA   0.0

which has the unfortunate effect of dividing ALL rows by 5, leading to issues when the class is not integer as in the $Sum column.

lapply(ASWldtSUM,function(x[,3])(x/5))

has the effect of returning only a single vector, which would work nicely if this were not a nested array of dataframes, but the statement

ASWdtSUM$NEWCOL<-lapply(ASWldtSUM,function(x[,3])(x/5))

Cannot simply be written because it is nested.

Using rapply as in the following statement:

rapply(ASWldtSUM,function(x) if (is.integer(x)) {(x/5)})

leads to a disordering of the results.

So, is there a simple way to either append a 4th column to each nested DataFrame, or to replace the third column of each DF (Sum) with that value divided by 5?

Was it helpful?

Solution

It is very simple, if ASWldtSUM is the name of the list containing the data frames, then you can do:

lapply(ASWldtSUM,FUN=function(x) { x[,3]=x[,3]/5; return(x) })

Basically you are replacing the (entire) third column with the division of the (entire) third colum by five.

In practice:

> ASWldtSUM1=data.frame(Ethnicity=rep('ASW',10),Variant=c("ACCEPTOR","CDS","CGA_CNVWIN","CGA_MIRB","DELETE","DISRUPT","DONOR","FRAMESHIFT","INSERT","INTRON"), Sum=c(1,68,1000,0,0,0,0,0,1,54))
> #created a first data.frame (equal to your example)
> ASWldtSUM2=data.frame(Ethnicity=rep('ASW',10),Variant=c("ACCEPTOR","CDS","CGA_CNVWIN","CGA_MIRB","DELETE","DISRUPT","DONOR","FRAMESHIFT","INSERT","INTRON"), Sum=c(1,2,3,4,5,6,7,8,9,10))
> #created a second data.frame (with different values for the third column)
> ASWldtSUM=list(ASWldtSUM1,ASWldtSUM2)
> #created a list of data frames
> lapply(ASWldtSUM,FUN=function(x) { x[,3]=x[,3]/5; return(x) })
> #apply the function to divide third column to each nested data.frame
[[1]]
   Ethnicity    Variant   Sum
1        ASW   ACCEPTOR   0.2
2        ASW        CDS  13.6
3        ASW CGA_CNVWIN 200.0
4        ASW   CGA_MIRB   0.0
5        ASW     DELETE   0.0
6        ASW    DISRUPT   0.0
7        ASW      DONOR   0.0
8        ASW FRAMESHIFT   0.0
9        ASW     INSERT   0.2
10       ASW     INTRON  10.8

[[2]]
   Ethnicity    Variant Sum
1        ASW   ACCEPTOR 0.2
2        ASW        CDS 0.4
3        ASW CGA_CNVWIN 0.6
4        ASW   CGA_MIRB 0.8
5        ASW     DELETE 1.0
6        ASW    DISRUPT 1.2
7        ASW      DONOR 1.4
8        ASW FRAMESHIFT 1.6
9        ASW     INSERT 1.8
10       ASW     INTRON 2.0
> #desired result

OTHER TIPS

There are many ways of doing this. Here is one:

Create some sample data:

dat <- lapply(1:3, function(x)data.frame(a=sample(letters, 4), b=sample(LETTERS, 4), z=rnorm(4)))

dat
[[1]]
  a b          z
1 r M  0.3054329
2 v I -0.8051859
3 t Q -1.6082701
4 u D -0.2315290

[[2]]
  a b          z
1 j W -0.4692469
2 f S  0.3112689
3 a D  0.4208704
4 w Z  0.6903139

[[3]]
....

Next, use a small anonymous function inside lapply(). For better illustration, I multiply by 100 rather than divide by 5:

lapply(dat, function(x){x[3] <- x[3]*100; x})

[[1]]
  a b          z
1 r M   30.54329
2 v I  -80.51859
3 t Q -160.82701
4 u D  -23.15290

[[2]]
  a b         z
1 j W -46.92469
2 f S  31.12689
3 a D  42.08704
4 w Z  69.03139

[[3]]
....
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top