Question

I wrote a function that calculates the deciles of each row in a vector. I am doing this with the intention of creating graphics to evaluate the efficacy of a predictive model. There has to be a easier way to do this, but I haven't been able to figure it out for a while. Does anyone have any idea how I could score a vector in this way without having so many nested ifelse() statements? I included the function as well as some code to copy my results.

# function
decile <- function(x){
  deciles <- vector(length=10)
  for (i in seq(0.1,1,.1)){
    deciles[i*10] <- quantile(x, i)
  }
  return (ifelse(x<deciles[1], 1,
         ifelse(x<deciles[2], 2,
                ifelse(x<deciles[3], 3,
                       ifelse(x<deciles[4], 4,
                              ifelse(x<deciles[5], 5,
                                     ifelse(x<deciles[6], 6,
                                            ifelse(x<deciles[7], 7,
                                                  ifelse(x<deciles[8], 8,
                                                         ifelse(x<deciles[9], 9, 10))))))))))
}

# check functionality
test.df <- data.frame(a = 1:10, b = rnorm(10, 0, 1))

test.df$deciles <- decile(test.df$b)

test.df

# order data frame
test.df[with(test.df, order(b)),]
Was it helpful?

Solution

You can use quantile and findInterval

# find the decile locations 
decLocations <- quantile(test.df$b, probs = seq(0.1,0.9,by=0.1))
# use findInterval with -Inf and Inf as upper and lower bounds
findInterval(test.df$b,c(-Inf,decLocations, Inf))

OTHER TIPS

Another solution is to use ecdf(), described in the help files as the inverse of quantile().

round(ecdf(test.df$b)(test.df$b) * 10)

Note that @mnel's solution is around 100 times faster.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top