Question

Consider the following vector:

vec = rnorm(1000)

I would like to compute the quintiles of this vector, and then average the vector values for each quintile.

I know the way of getting the quantiles is:

qtle = quantile(vec, seq(from = 0, to = 1, by = 0.2)

but I am not sure how to compute the mean of the values within each quantile (i.e. the mean of the bottom 20%, the mean of the next 20%, etc.) in an efficient manner.

Any ideas?

Thanks.

Was it helpful?

Solution

You can use findInterval and tapply for this.

set.seed(1)
vec = rnorm(1000)
qs <- quantile(vec, seq(from = 0, to = 1, by = 0.2))
tapply(vec, findInterval(vec, qs), mean)
#        1        2        3        4        5        6 
# -1.46746 -0.54260 -0.02399  0.54492  1.41894  3.81028 

OTHER TIPS

The above solution has many imperfections, ie. when vector has many same values or collection is odd, then findInterval method is not working as we wish.

Here is my simple solution

averageQuantile<- function(vec, value, value2) {
  chunk = getChunkOfVector(vec, value, value2)
  if(length(chunk) >0) {
    return(mean(chunk))
  }
  return(0.0)
}

getChunkOfVector<- function(vector, value, value2) {
  len = length(vector)
  result<-vector()
  vector<-sort(vector)
  k<-1
  for(i in vector){
    if(k/len > value & k/len <= value2) {
      result = append(result, i)
    }
    k<-k+1

  }
  return(result)
}

So if you simply need average of values between quantile(x, 0.25) and quantile(x, 0.5):

set.seed(1)
vec = rnorm(1000)
averageQuantile(vec, 0.25, 0.50)
# [1] -0.3397659
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top