Value of whisker in boxplot for 99.7 coverage
-
13-12-2019 - |
Question
I am trying to identify outliers from a boxplot using MATLAB. The function has a default whisker value of 1.5 that provides +- 2.7*sigma or 99.3 coverage. However, I want 99.7 or 3*sigma coverage. What could be the value of whisker in this case? I did not want to make a random guess, so need help from you guys. Thanks
Solution
In general, let:
Q1 = icdf('norm',0.25,0,1);
Q3 = icdf('norm',0.75,0,1);
IQR = Q3-Q1;
Now if you have a constant
k
(BOXPLOT by default hask=1.5
for the whisker length), then the IQR outlier test identifies values outside the range:[Q1 - k*IQR, Q3 + k*IQR]
as outliers, which corresponds to:>> k = 1.5; >> sdCov = [Q1 - k*IQR, Q3 + k*IQR] %# +/-2.698*sigma coverage sdCov = -2.698 2.698
or (in terms of area under the curve):
>> area = 2*normcdf(sdCov(2), 0, 1)-1 %# 99.3% coverage area = 0.99302
In the opposite direction, if you want a
sdCov*sigma
coverage, then:>> sdCov = 3; >> k = (Q1+sdCov)/IQR k = 1.7239
or:
>> area = 0.9973; >> sdCov = norminv(1-(1-area)/2); >> k = (Q1+sdCov)/IQR
Therefore use the following in your case:
boxplot(data, 'whisker',1.7239)
Here is an illustration borrowed from Wikipedia: