Question

In our logfiles we store response times for the requests. What's the most efficient way to calculate the median response time, the "75/90/95% of requests were served in less than N time" numbers etc? (I guess a variation of my question is: What's the best way to calculate the median and standard deviation of a bunch stream of numbers).

The best I came up with was just reading all the numbers, ordering them and then picking out the numbers, but that seems really goofy. Isn't there a smarter way?

We use Perl, but solutions for any language might be helpful.

Was it helpful?

Solution

See the article Calculating Percentiles in Memory-bound Applications. It explains how to calculate median and other percentiles efficiently.

Also, here's an article on calculating standard deviation (variance) as you go: Accurately computing running variance.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top