Question

As introduce in the title, I would like to select the 10% highest and the 10% lowest values from a vector. How can I manage to do that?

Anyone can help me ? Thanks a lot

No correct solution

OTHER TIPS

This is an example that takes roughly 10%:

v <- rnorm(100)
sort(v)[1:(length(v)/10)]                  # lowest, in increasing order.
sort(v, decreasing=TRUE)[1:(length(v)/10)] # highest, in decreasing order.

This will return a vector containing the bottom and top 10% of x:

> set.seed(123)
> x<-rnorm(100)
> x[{q<-rank(x)/length(x);q<0.1 | q>=0.9}]
 [1]  1.558708  1.715065 -1.265061  1.786913 -1.966617 -1.686693 -1.138137
 [8]  1.253815 -1.265396  2.168956 -1.123109  1.368602  1.516471 -1.548753
[15]  2.050085 -2.309169 -1.220718  1.360652  2.187333  1.532611

Note that sorting can be quite slow. For small vectors you won't notice this much, but if you want to do this for very large vectors then sorting the entire vector can be very slow and you don't need to fully sort the vector.

Look at the partial argument on the help page for sort and sort.int for how to do a partial sort which can still give you the top and bottom 10% without needing to do a full sort (the quantile function uses partial sorting internally, so should be faster in some cases than the full sort, but doing the partial sort yourself can eliminate some of the quantile overhead and give a bit more speed as well).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top