calculate mean by group by avoiding first value in the group in R

https://stackoverflow.com/questions/20709176

20-09-2022
|

Question

I have a big dataframe like this:

groupvar <- c("A", "A", "A", "A",  "B", "B", "B", "C",  "C", "C", "C", "D", "D", "D", "E", "E")
valuevar <- c( 1,  0.5, 0.5, 0.5,  1, 0.75, 0.75, 1, 0.8, 0.8, 0.8,    1, 0.9, 0.9,  1, 1.5)
myd <- data.frame (groupvar, valuevar)

   groupvar valuevar
1         A     1.00
2         A     0.50
3         A     0.50
4         A     0.50
5         B     1.00
6         B     0.75
7         B     0.75
8         C     1.00
9         C     0.80
10        C     0.80
11        C     0.80
12        D     1.00
13        D     0.90
14        D     0.90
15        E     1.00
16        E     1.50

I would like to calculate means but want to avoid the first value in first element in each groupvar. For example 1 is value given to first value in each group. For example for group "A" the average will be based on 0.5, 0.5, 0.5 avoiding first value 1.

This what I was thinking:

meanfun <- function(x)sum(x)-x[1]/ length(x)
ddply (myd,"groupvar",meanfun) 

Error in FUN(X[[1L]], ...) : 
  only defined on a data frame with all numeric variables

Solution

This can be helpful

> with(myd, tapply(valuevar, groupvar, function(x) mean(x[-1])))
   A    B    C    D    E 
0.50 0.75 0.80 0.90 1.50

Using aggregate

> aggregate(valuevar ~ groupvar, FUN=function(x) mean(x[-1]), data=myd)
  groupvar valuevar
1        A     0.50
2        B     0.75
3        C     0.80
4        D     0.90
5        E     1.50

Using ddply

> library(plyr)
> ddply (myd, "groupvar", summarize, MeanVar=mean(valuevar[-1]))
  groupvar MeanVar
1        A    0.50
2        B    0.75
3        C    0.80
4        D    0.90
5        E    1.50

OTHER TIPS

You could split the data by groupvar and apply the mean function.

groupvar <- c("A", "A", "A", "A",  "B", "B", "B", "C",  "C", "C", "C", "D", "D", "D", "E", "E")
valuevar <- c( 1,  0.5, 0.5, 0.5,  1, 0.75, 0.75, 1, 0.8, 0.8, 0.8,    1, 0.9, 0.9,  1, 1.5)
myd <- data.frame (groupvar, valuevar)

lapply(split(myd, f=myd[, "groupvar"]), function(x) mean(x[-1,2]))

What I would do is create a new dataframe that eliminates the first element of the group var. Then I would take the means over the group var.

myd_rmFstElement <- myd[which(duplicated(myd$groupvar)), ]
myd_means <- aggregate(valuevar ~ groupvar, FUN=mean, myd_rmFstElement)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow