Question

I am trying to identify outliers from a boxplot using MATLAB. The function has a default whisker value of 1.5 that provides +- 2.7*sigma or 99.3 coverage. However, I want 99.7 or 3*sigma coverage. What could be the value of whisker in this case? I did not want to make a random guess, so need help from you guys. Thanks

Was it helpful?

Solution

In general, let:

Q1 = icdf('norm',0.25,0,1);
Q3 = icdf('norm',0.75,0,1);
IQR = Q3-Q1;
  • Now if you have a constant k (BOXPLOT by default has k=1.5 for the whisker length), then the IQR outlier test identifies values outside the range: [Q1 - k*IQR, Q3 + k*IQR] as outliers, which corresponds to:

    >> k = 1.5;
    >> sdCov = [Q1 - k*IQR, Q3 + k*IQR]      %# +/-2.698*sigma coverage
    sdCov =
           -2.698        2.698
    

    or (in terms of area under the curve):

    >> area = 2*normcdf(sdCov(2), 0, 1)-1    %# 99.3% coverage
    area =
           0.99302
    
  • In the opposite direction, if you want a sdCov*sigma coverage, then:

    >> sdCov = 3;
    >> k = (Q1+sdCov)/IQR
    k =
           1.7239
    

    or:

    >> area = 0.9973;
    >> sdCov = norminv(1-(1-area)/2);
    >> k = (Q1+sdCov)/IQR
    

    Therefore use the following in your case:

    boxplot(data, 'whisker',1.7239)
    

Here is an illustration borrowed from Wikipedia:

http://en.wikipedia.org/wiki/Interquartile_range

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top