How do I turn the numeric output of boxplot (with plot=FALSE) into something usable?

StackOverflow https://stackoverflow.com/questions/8844845

  •  27-10-2019
  •  | 
  •  

Question

I'm successfully using the boxplot function to generate... boxplots. Now I need to generate tables containing the stats that boxplot calculates in order to create plots.

I do this by using the plot=FALSE option.

The problem is that this produces data in a rather bizarre format that I simply can't do anything with. Here's an example:

structure(list(stats = structure(c(178.998262143545, 182.227431564442, 
202.108456373209, 220.375358994654, 221.990406228232, 216.59986775699, 
217.054997032148, 228.509462713206, 267.070720949859, 284.832378859975, 
189.864120937198, 201.876421960518, 219.525439081472, 234.260088973545, 
279.343359793024, 209.472617639903, 209.526516071858, 214.785213079737, 
230.027361556731, 240.0647114578, 202.057148813419, 207.375619207685, 
220.093663781351, 226.246698737471, 240.343646265795), .Dim = c(5L, 
5L)), n = c(4, 6, 8, 4, 8), conf = structure(c(171.971593703341, 
232.245319043076, 196.247705331772, 260.771220094641, 201.435457751239, 
237.615420411705, 198.589545146688, 230.980881012787, 209.552007821332, 
230.635319741371), .Dim = c(2L, 5L)), out = numeric(0), group = numeric(0), 
names = c("U", "UM", "M", "LM", "L")), .Names = c("stats", "n", "conf", "out", "group", 
"names"))

What I want is a table for each of the stats -- min, max, median and the quartiles -- and their values for each group (the ones in "names").

Could somebody give me a hand with this? I'm very much an R beginner.

Thanks in advance!

Was it helpful?

Solution

boxplot returns a structure in R called a list.

A list is more-or-less a data container where you can refer to elements by name. If you do A <- boxplot(...), you can access the names with A$names, the conf with A$conf, etc.

So, looking at the boxplot helpfile ?boxplot under Value: (which tells you what boxplot returns), we see that it returns a list with the following components:

   stats: a matrix, each column contains the extreme of the lower
          whisker, the lower hinge, the median, the upper hinge and the
          extreme of the upper whisker for one group/plot.  If all the
          inputs have the same class attribute, so will this component.
       n: a vector with the number of observations in each group.    
    conf: a matrix where each column contains the lower and upper
          extremes of the notch.    
     out: the values of any data points which lie beyond the extremes
          of the whiskers.    
   group: a vector of the same length as ‘out’ whose elements indicate
          to which group the outlier belongs.    
   names: a vector of names for the groups.

So the table for each of the stats is in A$stats, each column belongs to a group and contains the min, lower quartile, median, upper quartile, and max.

You could do:

A <- boxplot(...)
mytable <- A$stats
colnames(mytable)<-A$names
rownames(mytable)<-c('min','lower quartile','median','upper quartile','max')
mytable 

which returns (for mytable):

                      U       UM        M       LM        L
min            178.9983 216.5999 189.8641 209.4726 202.0571
lower quartile 182.2274 217.0550 201.8764 209.5265 207.3756
median         202.1085 228.5095 219.5254 214.7852 220.0937
upper quartile 220.3754 267.0707 234.2601 230.0274 226.2467
max            221.9904 284.8324 279.3434 240.0647 240.3436

Then you can refer to it like mytable['min','U'].

OTHER TIPS

If you really want quantiles of your data instead of boxplot numbers, using quantile directly would be my choice (it is far easier to read if you look through what you did later).

quantile (x, probs = c (0, .25, .5,.75, 1))

quantile itself does not work with groups, but you can combine it with aggregate so it is called for each of the groups given in argument by (needs to be a list, so you can combine here several grouping factors):

aggregate (chondro$x, by = list (chondro$clusters), 
           FUN = quantile, probs = c (0, .25, .5,.75, 1))

with the result:

   Group.1   x.0%  x.25%  x.50%  x.75% x.100%
1  matrix -11.55  -6.55   5.45  14.45  22.45
2  lacuna -11.55  -2.55   4.45  10.45  22.45
3    cell  -8.55  -1.55  11.45  15.45  20.45

If you really want to have boxplot numbers (e.g. how far the whiskers go), have a look at ? fivenum and ? boxplot.stats.

Others have answered the specific question about the return object for the boxplot function, I would just add that if you want to learn about return objects in general then you should really learn about lists and how to use the str function which will generally give you a much more meaningful view of an object then what you show above. There is also the TkListView function in the TeachingDemos package that gives a more interactive exploration of list and other objects. Using str and names and subsetting (see help("[")) will let you get a feel for what is in a return object (the help page for the function that created the object is also a good place to start) and how to access the pieces you want.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top