How to find top n% of records in a column of a dataframe using R
문제
I have a dataset showing the exchange rate of the Australian Dollar versus the US dollar once a day over a period of about 20 years. I have the data in a data frame, with the first column being the date, and the second column being the exchange rate. Here's a sample from the data:
>data
V1 V2
1 12/12/1983 0.9175
2 13/12/1983 0.9010
3 14/12/1983 0.9000
4 15/12/1983 0.8978
5 16/12/1983 0.8928
6 19/12/1983 0.8770
7 20/12/1983 0.8795
8 21/12/1983 0.8905
9 22/12/1983 0.9005
10 23/12/1983 0.9005
How would I go about displaying the top n% of these records? E.g. say I want to see the days and exchange rates for those days where the exchange rate falls in the top 5% of all exchange rates in the dataset?
해결책
For the top 5%:
n <- 5
data[data$V2 > quantile(data$V2,prob=1-n/100),]
다른 팁
For the top 5% also:
head(data[order(data$V2,decreasing=T),],.05*nrow(data))
Another solution could be use for sqldf
if the data is sorted based on the V1
value:
library(sqldf)
sqldf('SELECT * FROM df
ORDER BY V1
LIMIT (SELECT 0.05 * COUNT(*) FROM df)
')
You can change the rate form 0.05
(5%
) to any required rate.
제휴하지 않습니다 StackOverflow