¿Fue útil?

Pregunta

How to find group-wise summary statistics for an R data frame?

R ProgrammingServer Side ProgrammingProgramming

To compare different groups, we need the summary statistics for each of the groups. It helps us to observe the differences between the groups. The summary statistics provides the minimum value, first quartile, median, third quartile, and the maximum values. Therefore, we can compare each of these values for the groups. To find the group-wise summary statistics for an R data frame, we can use tapply function.

Example

Consider the below data frame −

> set.seed(99)
> x1<-sample(1:100,50,replace=TRUE)
> x2<-rep(c("G1","G2","G3","G4","G5"),times=10)
> df<-data.frame(x1,x2)
> head(df,20)
x1 x2
1 48 G1
2 33 G2
3 44 G3
4 22 G4
5 99 G5
6 62 G1
7 98 G2
8 32 G3
9 13 G4
10 20 G5
11 100 G1
12 31 G2
13 68 G3
14 9 G4
15 82 G5
16 88 G1
17 30 G2
18 86 G3
19 84 G4
20 32 G5

Finding the summary statistics of x1 for each group −

> tapply(df$x1, df$x2, summary)
$G1
Min. 1st Qu. Median Mean 3rd Qu. Max.
14.0 55.0 72.0 67.8 86.5 100.0
$G2
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.0 31.5 60.5 52.4 69.5 98.0
$G3
Min. 1st Qu. Median Mean 3rd Qu. Max.
14.0 33.5 41.0 46.9 64.5 86.0
$G4
Min. 1st Qu. Median Mean 3rd Qu. Max.
9.00 23.75 53.00 53.30 82.75 97.00
$G5
Min. 1st Qu. Median Mean 3rd Qu. Max.
7.00 31.25 32.00 42.40 44.75 99.00

Let’s have a look at one more example −

> y1<-rep(c(letters[1:5]),times=5)
> y2<-rep(c(14,25,13,12,41,52,44,28,17,30),times=c(2,5,3,3,1,5,1,2,2,1))
> df_y<-data.frame(y1,y2)
> head(df_y,20)
  y1 y2
 1 a 14
 2 b 14
 3 c 25
 4 d 25
 5 e 25
 6 a 25
 7 b 25
 8 c 13
 9 d 13
10 e 13
11 a 12
12 b 12
13 c 12
14 d 41
15 e 52
16 a 52
17 b 52
18 c 52
19 d 52
20 e 44
> tapply(df_y$y2, df_y$y1, summary)
$a
Min. 1st Qu. Median Mean 3rd Qu. Max.
12.0 14.0 25.0 26.2 28.0 52.0
$b
Min. 1st Qu. Median Mean 3rd Qu. Max.
12.0 14.0 25.0 26.2 28.0 52.0
$c
Min. 1st Qu. Median Mean 3rd Qu. Max.
12.0 13.0 17.0 23.8 25.0 52.0
$d
Min. 1st Qu. Median Mean 3rd Qu. Max.
13.0 17.0 25.0 29.6 41.0 52.0
$e
Min. 1st Qu. Median Mean 3rd Qu. Max.
13.0 25.0 30.0 32.8 44.0 52.0
raja
Published on 11-Aug-2020 16:57:20
Advertisements
¿Fue útil?
No afiliado a Tutorialspoint
scroll top