For mean: (V1 %*% V2)/sum(V2)
For SD: sqrt(((V1-(V1 %*% V2)/sum(V2))**2 %*% V2)/sum(V2))
Question
maybe it is a very easy question. This is my data.frame:
> read.table("text.txt")
V1 V2
1 26 22516
2 28 17129
3 30 38470
4 32 12920
5 34 30835
6 36 36244
7 38 24482
8 40 67482
9 42 23121
10 44 51643
11 46 61064
12 48 37678
13 50 98817
14 52 31741
15 54 74672
16 56 85648
17 58 53813
18 60 135534
19 62 46621
20 64 89266
21 66 99818
22 68 60071
23 70 168558
24 72 67059
25 74 194730
26 76 278473
27 78 217860
It means that I have 22516 sequences with length 26, 17129 sequences with length 28, etc. I would like to know the sequence length mean and its standard deviation. I know how to do it, but I know to do it creating a list full of 26 repeated 22516 times and so on... and then compute the mean and SD. However, I thing there is a easier method. Any idea?
Thanks.
Solution
For mean: (V1 %*% V2)/sum(V2)
For SD: sqrt(((V1-(V1 %*% V2)/sum(V2))**2 %*% V2)/sum(V2))
OTHER TIPS
I do not find mean(rep(V1,V2)) # 61.902
and sd(rep(V1,V2)) # 14.23891
that complex, but alternatively you might try:
weighted.mean(V1,V2) # 61.902
# recipe from http://www.ltcconline.net/greenl/courses/201/descstat/meansdgrouped.htm
sqrt((sum((V1^2)*V2)-(sum(V1*V2)^2)/sum(V2))/(sum(V2)-1)) # 14.23891
Step1: Set up data:
dat.df <- read.table(text="id V1 V2
1 26 22516
2 28 17129
3 30 38470
4 32 12920
5 34 30835
6 36 36244
7 38 24482
8 40 67482
9 42 23121
10 44 51643
11 46 61064
12 48 37678
13 50 98817
14 52 31741
15 54 74672
16 56 85648
17 58 53813
18 60 135534
19 62 46621
20 64 89266
21 66 99818
22 68 60071
23 70 168558
24 72 67059
25 74 194730
26 76 278473
27 78 217860",header=T)
Step2: Convert to data.table
(only for simplicity and laziness in typing)
library(data.table)
dat <- data.table(dat.df)
Step3: Set up new columns with products, and use them to find mean
dat[,pr:=V1*V2]
dat[,v1sq:=as.numeric(V1*V1*V2)]
dat.Mean <- sum(dat$pr)/sum(dat$V2)
dat.SD <- sqrt( (sum(dat$v1sq)/sum(dat$V2)) - dat.Mean^2)
Hope this helps!!
MEAN = (V1*V2)/sum(V2)
SD = sqrt((V1*V1*V2)/sum(V2) - MEAN^2)