calculation variance and standard deviation in a single parse

https://stackoverflow.com/questions/18689903

28-06-2022
|

Question

I have a very large network trace file with two timestamps on each packet.I calculate the difference between the timestamps for each pair consecutive packets.

delta_ts1 = ts1(packet N) - ts1(packet N-1)
delta_ts2 = ts2(packet N) - ts2(packet N-1)

Assume ts_2 is the reference value and I want to test ts_1 against ts_2.

And the variance ts_variance = (delta_ts2 - mean_ts)^2/packet_count

Now the problem with the above approach is that I don't get the mean till I reach the end of the file.I want to achieve this in a single parse.I am thinking of using an approach as below

running_mean_till_now += ts2/packet_count_till_now

ts_variance = (delta_ts2 - running_mean_till_now)^2/packet_count_till_now

Is this approach acceptable? How accurate will the estimated variance and hence the standard deviation will by using this approach.?

La solution

The formula is not quite right. Here you have a description of an online algorithm which you can use.

Autres conseils

First of all, without doing any research, I can tell that it is possible to compute the running mean of a series of numbers. WITHOUT having to scan the series each time.

The basic idea, is let's you have the mean of four numbers (2,3,4,1 mean=10/4). Now, your code reads the fifth number (say 5). Now, compute the new mean as (10/4 * 4 + 5) / 5 = 3.

Now, when you read the sixth number, then the next new mean is (15 + 9) / 6 = 4.

The link given by Mihai Maruseac shows the symbolic calcs behind this example, and it shows how to compute the "running" (online) std dev.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow