Question

maybe it is a very easy question. This is my data.frame:

> read.table("text.txt")
   V1       V2
1  26    22516
2  28    17129
3  30    38470
4  32    12920
5  34    30835
6  36    36244
7  38    24482
8  40    67482
9  42    23121
10 44    51643
11 46    61064
12 48    37678
13 50    98817
14 52    31741
15 54    74672
16 56    85648
17 58    53813
18 60   135534
19 62    46621
20 64    89266
21 66    99818
22 68    60071
23 70   168558
24 72    67059
25 74   194730
26 76   278473
27 78   217860

It means that I have 22516 sequences with length 26, 17129 sequences with length 28, etc. I would like to know the sequence length mean and its standard deviation. I know how to do it, but I know to do it creating a list full of 26 repeated 22516 times and so on... and then compute the mean and SD. However, I thing there is a easier method. Any idea?

Thanks.

Was it helpful?

Solution

For mean: (V1 %*% V2)/sum(V2)

For SD: sqrt(((V1-(V1 %*% V2)/sum(V2))**2 %*% V2)/sum(V2))

OTHER TIPS

I do not find mean(rep(V1,V2)) # 61.902 and sd(rep(V1,V2)) # 14.23891 that complex, but alternatively you might try:

weighted.mean(V1,V2) # 61.902
# recipe from http://www.ltcconline.net/greenl/courses/201/descstat/meansdgrouped.htm
sqrt((sum((V1^2)*V2)-(sum(V1*V2)^2)/sum(V2))/(sum(V2)-1)) # 14.23891

Step1: Set up data:

dat.df <- read.table(text="id   V1       V2
1  26    22516
2  28    17129
                  3  30    38470
                  4  32    12920
                  5  34    30835
                  6  36    36244
                  7  38    24482
                  8  40    67482
                  9  42    23121
                  10 44    51643
                  11 46    61064
                  12 48    37678
                  13 50    98817
                  14 52    31741
                  15 54    74672
                  16 56    85648
                  17 58    53813
                  18 60   135534
                  19 62    46621
                  20 64    89266
                  21 66    99818
                  22 68    60071
                  23 70   168558
                  24 72    67059
                  25 74   194730
                  26 76   278473
                  27 78   217860",header=T)

Step2: Convert to data.table (only for simplicity and laziness in typing)

library(data.table)
dat <- data.table(dat.df)

Step3: Set up new columns with products, and use them to find mean

dat[,pr:=V1*V2]
dat[,v1sq:=as.numeric(V1*V1*V2)]

dat.Mean <- sum(dat$pr)/sum(dat$V2)

dat.SD <- sqrt( (sum(dat$v1sq)/sum(dat$V2)) - dat.Mean^2)

Hope this helps!!

MEAN = (V1*V2)/sum(V2)

SD = sqrt((V1*V1*V2)/sum(V2) - MEAN^2)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top