How to find top n% of records in a column of a dataframe using R

https://stackoverflow.com/questions/1563961

r
dataframe

21-09-2019
|

문제

I have a dataset showing the exchange rate of the Australian Dollar versus the US dollar once a day over a period of about 20 years. I have the data in a data frame, with the first column being the date, and the second column being the exchange rate. Here's a sample from the data:

>data
             V1     V2
1    12/12/1983 0.9175
2    13/12/1983 0.9010
3    14/12/1983 0.9000
4    15/12/1983 0.8978
5    16/12/1983 0.8928
6    19/12/1983 0.8770
7    20/12/1983 0.8795
8    21/12/1983 0.8905
9    22/12/1983 0.9005
10   23/12/1983 0.9005

How would I go about displaying the top n% of these records? E.g. say I want to see the days and exchange rates for those days where the exchange rate falls in the top 5% of all exchange rates in the dataset?

해결책

For the top 5%:

n <- 5
data[data$V2 > quantile(data$V2,prob=1-n/100),]

다른 팁

For the top 5% also:

head(data[order(data$V2,decreasing=T),],.05*nrow(data))

Another solution could be use for sqldf if the data is sorted based on the V1 value:

library(sqldf)
sqldf('SELECT * FROM df
       ORDER BY V1
       LIMIT (SELECT 0.05 * COUNT(*) FROM df)
      ')

You can change the rate form 0.05 (5%) to any required rate.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow