tapply and error summary statistics for some factors

https://stackoverflow.com/questions/21963134

15-10-2022
|

Domanda

I tried to find an explanation for the summary results when using with tapply. In the following example, the summary statistics are wrong for factor "Reg2". Could someone help us understand that behavior?

> edf=data.frame(pri=c(8258, 14253, 11123, 11311),
             reg=c("Reg1", "Reg2", "Reg2", "Reg1"))
> tapply(edf$pri, edf$reg, sum)
 Reg1  Reg2 
19569 25376 
> tapply(edf$pri, edf$reg, length)
Reg1 Reg2 
   2    2 
> tapply(edf$pri, edf$reg, mean)
   Reg1    Reg2 
 9784.5 12688.0 
> tapply(edf$pri, edf$reg, min)
 Reg1  Reg2 
 8258 11123 
> tapply(edf$pri, edf$reg, summary)
$Reg1
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   8258    9021    9784    9784   10550   11310 

$Reg2
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  11120   11910   12690   12690   13470   14250 

> by(edf$pri, edf$reg, summary)
edf$reg: Reg1
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   8258    9021    9784    9784   10550   11310 

edf$reg: Reg2
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  11120   11910   12690   12690   13470   14250 
> do.call("rbind",tapply(edf$pri, edf$reg, summary))
      Min. 1st Qu. Median  Mean 3rd Qu.  Max.
Reg1  8258    9021   9784  9784   10550 11310
Reg2 11120   11910  12690 12690   13470 14250
> str(edf)
'data.frame':   4 obs. of  2 variables:
 $ pri: num  8258 14253 11123 11311
 $ reg: Factor w/ 2 levels "Reg1","Reg2": 1 2 2 1

Soluzione

From ?summary

  digits: integer, used for number formatting with ‘signif()’ (for
          ‘summary.default’) or ‘format()’ (for ‘summary.data.frame’).

tapply(edf$pri, edf$reg, summary, digits = 42)

## $Reg1
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##  8258.00  9021.25  9784.50  9784.50 10547.75 11311.00 

## $Reg2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 11123.0 11905.5 12688.0 12688.0 13470.5 14253.0

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow