The formula is not quite right. Here you have a description of an online algorithm which you can use.
calculation variance and standard deviation in a single parse
-
28-06-2022 - |
문제
I have a very large network trace file with two timestamps on each packet.I calculate the difference between the timestamps for each pair consecutive packets.
delta_ts1 = ts1(packet N) - ts1(packet N-1)
delta_ts2 = ts2(packet N) - ts2(packet N-1)
Assume ts_2 is the reference value and I want to test ts_1 against ts_2.
And the variance ts_variance = (delta_ts2 - mean_ts)^2/packet_count
Now the problem with the above approach is that I don't get the mean till I reach the end of the file.I want to achieve this in a single parse.I am thinking of using an approach as below
running_mean_till_now += ts2/packet_count_till_now
ts_variance = (delta_ts2 - running_mean_till_now)^2/packet_count_till_now
Is this approach acceptable? How accurate will the estimated variance and hence the standard deviation will by using this approach.?
해결책
다른 팁
First of all, without doing any research, I can tell that it is possible to compute the running mean of a series of numbers. WITHOUT having to scan the series each time.
The basic idea, is let's you have the mean of four numbers (2,3,4,1 mean=10/4). Now, your code reads the fifth number (say 5). Now, compute the new mean as (10/4 * 4 + 5) / 5 = 3.
Now, when you read the sixth number, then the next new mean is (15 + 9) / 6 = 4.
The link given by Mihai Maruseac shows the symbolic calcs behind this example, and it shows how to compute the "running" (online) std dev.