문제

Consider the following vector:

vec = rnorm(1000)

I would like to compute the quintiles of this vector, and then average the vector values for each quintile.

I know the way of getting the quantiles is:

qtle = quantile(vec, seq(from = 0, to = 1, by = 0.2)

but I am not sure how to compute the mean of the values within each quantile (i.e. the mean of the bottom 20%, the mean of the next 20%, etc.) in an efficient manner.

Any ideas?

Thanks.

도움이 되었습니까?

해결책

You can use findInterval and tapply for this.

set.seed(1)
vec = rnorm(1000)
qs <- quantile(vec, seq(from = 0, to = 1, by = 0.2))
tapply(vec, findInterval(vec, qs), mean)
#        1        2        3        4        5        6 
# -1.46746 -0.54260 -0.02399  0.54492  1.41894  3.81028 

다른 팁

The above solution has many imperfections, ie. when vector has many same values or collection is odd, then findInterval method is not working as we wish.

Here is my simple solution

averageQuantile<- function(vec, value, value2) {
  chunk = getChunkOfVector(vec, value, value2)
  if(length(chunk) >0) {
    return(mean(chunk))
  }
  return(0.0)
}

getChunkOfVector<- function(vector, value, value2) {
  len = length(vector)
  result<-vector()
  vector<-sort(vector)
  k<-1
  for(i in vector){
    if(k/len > value & k/len <= value2) {
      result = append(result, i)
    }
    k<-k+1

  }
  return(result)
}

So if you simply need average of values between quantile(x, 0.25) and quantile(x, 0.5):

set.seed(1)
vec = rnorm(1000)
averageQuantile(vec, 0.25, 0.50)
# [1] -0.3397659
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top