Question

I loaded a data set called gob into R and tried the handy summary function. It is Note that the 3rd quartile is less than the mean. How can this be? Is it the size of my data or something else like that?

I already tried passing in a large value for the digits parameter (e.g. 10), and that does not resolve the issue.

> summary(gob, digits=10)

   customer_id         100101.D            100199.D            100201.D        
 Min.   :   1083   Min.   :0.0000000   Min.   :0.0000000   Min.   :0.0000000  
 1st Qu.: 965928   1st Qu.:0.0000000   1st Qu.:0.0000000   1st Qu.:0.0000000  
 Median :2448738   Median :0.0000000   Median :0.0000000   Median :0.0000000  
 Mean   :2660101   Mean   :0.0010027   Mean   :0.0013348   Mean   :0.0000878  
 3rd Qu.:4133368   3rd Qu.:0.0000000   3rd Qu.:0.0000000   3rd Qu.:0.0000000  
 Max.   :6538193   Max.   :1.0000000   Max.   :1.0000000   Max.   :0.7520278  

Note that for gob$100201.D the mean is 0.0000878 but the 3rd Qu. = 0.

Was it helpful?

Solution

It is not a bug, just your data contains lot of 0 values. For example, if I make x with twelve 0 and one 1, I get result that 3rd quartile is smaller than mean

 x<-c(0,0,0,0,0,0,0,0,0,0,0,0,1)
summary(x)

  Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.00000 0.00000 0.00000 0.07692 0.00000 1.00000 

Try to use table() on your column to see distribution of values

table(x)
 x
 0  1 
 12  1 

OTHER TIPS

The 3rd quantile can be lower than the mean. It's not 75% of the highest value, but the value at 75% of the count of a vector when ordered from lowest to highest. In other words:

Vector <- c(0,0,0,0,0,0,0,1)
mean(Vector)
[1] 0.125
quantile(Vector, 0.75)
[1] 0

To find the 3rd quantile, R orders all the data from lowest to highest, then picks the value closest to 75% of the length of that vector. So basically:

3rdQuar = Vector[round(length(Vector)*0.75)]

(Note that if it lands between two whole numbers, R will actually average the two. But this is the basic idea)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top