Question

I have a very large network trace file with two timestamps on each packet.I calculate the difference between the timestamps for each pair consecutive packets.

delta_ts1 = ts1(packet N) - ts1(packet N-1)
delta_ts2 = ts2(packet N) - ts2(packet N-1)

Assume ts_2 is the reference value and I want to test ts_1 against ts_2.

And the variance ts_variance = (delta_ts2 - mean_ts)^2/packet_count

Now the problem with the above approach is that I don't get the mean till I reach the end of the file.I want to achieve this in a single parse.I am thinking of using an approach as below

running_mean_till_now += ts2/packet_count_till_now

ts_variance = (delta_ts2 - running_mean_till_now)^2/packet_count_till_now

Is this approach acceptable? How accurate will the estimated variance and hence the standard deviation will by using this approach.?

Était-ce utile?

La solution

The formula is not quite right. Here you have a description of an online algorithm which you can use.

Autres conseils

First of all, without doing any research, I can tell that it is possible to compute the running mean of a series of numbers. WITHOUT having to scan the series each time.

The basic idea, is let's you have the mean of four numbers (2,3,4,1 mean=10/4). Now, your code reads the fifth number (say 5). Now, compute the new mean as (10/4 * 4 + 5) / 5 = 3.

Now, when you read the sixth number, then the next new mean is (15 + 9) / 6 = 4.

The link given by Mihai Maruseac shows the symbolic calcs behind this example, and it shows how to compute the "running" (online) std dev.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top