Question

We have this set of data that we need to get the average of a column. a select avg(x) from y does the trick. However we need a more accurate figure.

I figured that there must be a way of filtering records that has either too high or too low values(spikes) so that we can exclude them in calculating the average.

Was it helpful?

Solution

There are three types of average, and what you are originally using is the mean - the sum of all the values divided by the number of values.

You might find it more useful to get the mode - the most frequently occuring value:

select name,    
       (select top 1 h.run_duration
        from sysjobhistory h
        where h.step_id = 0
        and h.job_id = j.job_id
        group by h.run_duration
        order by count(*) desc) run_duration
from sysjobs j

If you did want to get rid of any values outside the original standard deviation, you could find the average and the standard deviation in a subquery, eliminate those values which are outside the range : average +- standard deviation, then do a further average of the remaining values, but you start running the risk of having meaningless values:

select oh.job_id, avg(oh.run_duration) from sysjobhistory oh
inner join (select job_id, avg(h.run_duration) avgduration, 
            stdev(h.run_duration) stdev_duration 
            from sysjobhistory h 
            group by job_id) as m on m.job_id = oh.job_id
where oh.step_id = 0
and abs(oh.run_duration - m.avgduration) <  m.stdev_duration
group by oh.job_id

OTHER TIPS

in sql server there's also the STDEV function so maybe that can be of some help...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top