You can use quantile
and findInterval
# find the decile locations
decLocations <- quantile(test.df$b, probs = seq(0.1,0.9,by=0.1))
# use findInterval with -Inf and Inf as upper and lower bounds
findInterval(test.df$b,c(-Inf,decLocations, Inf))
Question
I wrote a function that calculates the deciles of each row in a vector. I am doing this with the intention of creating graphics to evaluate the efficacy of a predictive model. There has to be a easier way to do this, but I haven't been able to figure it out for a while. Does anyone have any idea how I could score a vector in this way without having so many nested ifelse() statements? I included the function as well as some code to copy my results.
# function
decile <- function(x){
deciles <- vector(length=10)
for (i in seq(0.1,1,.1)){
deciles[i*10] <- quantile(x, i)
}
return (ifelse(x<deciles[1], 1,
ifelse(x<deciles[2], 2,
ifelse(x<deciles[3], 3,
ifelse(x<deciles[4], 4,
ifelse(x<deciles[5], 5,
ifelse(x<deciles[6], 6,
ifelse(x<deciles[7], 7,
ifelse(x<deciles[8], 8,
ifelse(x<deciles[9], 9, 10))))))))))
}
# check functionality
test.df <- data.frame(a = 1:10, b = rnorm(10, 0, 1))
test.df$deciles <- decile(test.df$b)
test.df
# order data frame
test.df[with(test.df, order(b)),]
Solution
You can use quantile
and findInterval
# find the decile locations
decLocations <- quantile(test.df$b, probs = seq(0.1,0.9,by=0.1))
# use findInterval with -Inf and Inf as upper and lower bounds
findInterval(test.df$b,c(-Inf,decLocations, Inf))
OTHER TIPS
Another solution is to use ecdf()
, described in the help files as the inverse of quantile()
.
round(ecdf(test.df$b)(test.df$b) * 10)
Note that @mnel's solution is around 100 times faster.